[ 
https://issues.apache.org/jira/browse/JENA-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954380#comment-16954380
 ] 

Andy Seaborne commented on JENA-1769:
-------------------------------------

That dataset also shows the problem is as expected.

What's happening was that the default implementation was causing all nodes to 
be read and a million different terms, larger than the node table cache, so 
there is external I/O. 

When I first tried to reproduce this, I has a million quads, expecting the cost 
was due small java objects. In that case I got (cold), 180ms vs ~2s which is 
somewhat less extreme. With all distinct terms in triples (3 million terms), I 
get ~20s.

(Aside: maybe time to increase the default cache sizes again - they creep up 
over the years as "normal" machines get bigger)


> Dataset#listNames slow for large TDB2 datasets
> ----------------------------------------------
>
>                 Key: JENA-1769
>                 URL: https://issues.apache.org/jira/browse/JENA-1769
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB2
>    Affects Versions: Jena 3.13.0
>            Reporter: Damien Obrist
>            Assignee: Andy Seaborne
>            Priority: Major
>              Labels: performance
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> With Jena 3.13.0, the running time of {{Dataset#listNames}} has increased 
> significantly for TDB2 datasets.
> I have compared the running times for a sample TDB2 dataset containing 
> *1'000'000 triples*. I have observed a running time of *~270ms* with Jena 
> 3.12.0 and *~13.5s* with Jena 3.13.0.
> We're using a dataset with many millions of triples and for our use case, the 
> running time has increased from seconds to minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to