O(n log n) sounds about right.  Based on the most recent results I get this 
<https://www.wolframalpha.com/input/?i=quadratic+fit+%7B23%2C908.85%7D%2C%7B69%2C+3584.09%7D%2C%7B115%2C7153.94%7D>
 
formula for the expected time to process X files of 10,000 articles each.  
With an x^2 term of 0.21 that's just above linear, but once you're up to 
over 4,600 files it's still way too damn long.

I will try the CREATE TABLE ... AS SELECT DISTINCT thing and get back to 
you.  Thanks.

On Monday, October 7, 2019 at 7:40:01 AM UTC-4, Noel Grandin wrote:
>
>
>
> On 2019/10/07 1:32 PM, Tim Fielder wrote:> The problem with this approach 
> is that since the tables are indexed, the 
> insert time grows quadratically with the size 
>  > of the table.  As a result I can handle 230,000 articles in about 2 
> hours, but the full 46.7 million will take at least 
>  > 300 days. 
>
> That should not be the case, insert time should be something like O(n log 
> n) 
> So not sure why it is so slow for you. 
>
>
>  > 
>  > In order to defer the application of constraints until after I fully 
> complete parsing, the schema becomes simply:> 
>
> If you are going to do something like this, then rather 
>
> (*) insert all rows into tempdoc 
>
> (*) CREATE TABLE document AS SELECT DISTINCT .... FROM tempdoc 
>
> (*) add constraints to document 
>
> and similarly for other tables. 
>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/dc001d20-1950-41f7-a7d0-c9c25eae310b%40googlegroups.com.

Reply via email to