O(n log n) sounds about right. Based on the most recent results I get this <https://www.wolframalpha.com/input/?i=quadratic+fit+%7B23%2C908.85%7D%2C%7B69%2C+3584.09%7D%2C%7B115%2C7153.94%7D> formula for the expected time to process X files of 10,000 articles each. With an x^2 term of 0.21 that's just above linear, but once you're up to over 4,600 files it's still way too damn long.
I will try the CREATE TABLE ... AS SELECT DISTINCT thing and get back to you. Thanks. On Monday, October 7, 2019 at 7:40:01 AM UTC-4, Noel Grandin wrote: > > > > On 2019/10/07 1:32 PM, Tim Fielder wrote:> The problem with this approach > is that since the tables are indexed, the > insert time grows quadratically with the size > > of the table. As a result I can handle 230,000 articles in about 2 > hours, but the full 46.7 million will take at least > > 300 days. > > That should not be the case, insert time should be something like O(n log > n) > So not sure why it is so slow for you. > > > > > > In order to defer the application of constraints until after I fully > complete parsing, the schema becomes simply:> > > If you are going to do something like this, then rather > > (*) insert all rows into tempdoc > > (*) CREATE TABLE document AS SELECT DISTINCT .... FROM tempdoc > > (*) add constraints to document > > and similarly for other tables. > > -- You received this message because you are subscribed to the Google Groups "H2 Database" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/h2-database/dc001d20-1950-41f7-a7d0-c9c25eae310b%40googlegroups.com.
