[
https://issues.apache.org/jira/browse/JENA-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954380#comment-16954380
]
Andy Seaborne commented on JENA-1769:
-------------------------------------
That dataset also shows the problem is as expected.
What's happening was that the default implementation was causing all nodes to
be read and a million different terms, larger than the node table cache, so
there is external I/O.
When I first tried to reproduce this, I has a million quads, expecting the cost
was due small java objects. In that case I got (cold), 180ms vs ~2s which is
somewhat less extreme. With all distinct terms in triples (3 million terms), I
get ~20s.
(Aside: maybe time to increase the default cache sizes again - they creep up
over the years as "normal" machines get bigger)
> Dataset#listNames slow for large TDB2 datasets
> ----------------------------------------------
>
> Key: JENA-1769
> URL: https://issues.apache.org/jira/browse/JENA-1769
> Project: Apache Jena
> Issue Type: Bug
> Components: TDB2
> Affects Versions: Jena 3.13.0
> Reporter: Damien Obrist
> Assignee: Andy Seaborne
> Priority: Major
> Labels: performance
> Time Spent: 10m
> Remaining Estimate: 0h
>
> With Jena 3.13.0, the running time of {{Dataset#listNames}} has increased
> significantly for TDB2 datasets.
> I have compared the running times for a sample TDB2 dataset containing
> *1'000'000 triples*. I have observed a running time of *~270ms* with Jena
> 3.12.0 and *~13.5s* with Jena 3.13.0.
> We're using a dataset with many millions of triples and for our use case, the
> running time has increased from seconds to minutes.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)