O(n log n) sounds about right.  Based on the most recent results I get this 
<https://www.wolframalpha.com/input/?i=quadratic+fit+%7B23%2C908.85%7D%2C%7B69%2C+3584.09%7D%2C%7B115%2C7153.94%7D>
 
formula for the expected time to process X files of 10,000 articles each.  
With an x^2 term of 0.21 that's just above linear, but once you're up to 
over 4,600 files it's still way too damn long.

I will try the CREATE TABLE ... AS SELECT DISTINCT thing and get back to 
you.  Thanks.

On Monday, October 7, 2019 at 7:40:01 AM UTC-4, Noel Grandin wrote:
>
>
>
> On 2019/10/07 1:32 PM, Tim Fielder wrote:> The problem with this approach 
> is that since the tables are indexed, the 
> insert time grows quadratically with the size 
>  > of the table.  As a result I can handle 230,000 articles in about 2 
> hours, but the full 46.7 million will take at least 
>  > 300 days. 
>
> That should not be the case, insert time should be something like O(n log 
> n) 
> So not sure why it is so slow for you. 
>
>
>  > 
>  > In order to defer the application of constraints until after I fully 
> complete parsing, the schema becomes simply:> 
>
> If you are going to do something like this, then rather 
>
> (*) insert all rows into tempdoc 
>
> (*) CREATE TABLE document AS SELECT DISTINCT .... FROM tempdoc 
>
> (*) add constraints to document 
>
> and similarly for other tables. 
>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/dc001d20-1950-41f7-a7d0-c9c25eae310b%40googlegroups.com.

Reply via email to