[
https://issues.apache.org/jira/browse/JENA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227076#comment-16227076
]
Michał Woźniak commented on JENA-1414:
--------------------------------------
The proposed solution seems as a workaround to the issue that the operation has
now O(n+m) complexity. It used to be O(m) in Jena 3.1.
I think I see a way to adapt GraphUtil.deleteFrom(Graph,Graph) [see
graph_util_improve.patch]. Instead of computing targetGraph.size() by iterating
over all triples it is enough to stop the iteration once the counter exceeds
some threshold. See the method below:
{code:java}
static int compareSizeTo(Graph g, int size) {
int counter = 0;
ExtendedIterator<Triple> it = g.find(Triple.ANY);
try {
while (it.hasNext()) {
it.next();
counter += 1;
if (counter > size) {
return 1;
}
}
return counter == size ? 0 : -1;
} finally {
it.close();
}
}
{code}
Similar trick can be used to compare sizes of two graphs - get two iterators
and start consuming them until the smaller ends. This way O(min(n,m))
complexity can be achieved.
{code:java}
static int compareSizes(Graph g1, Graph g2) {
ExtendedIterator<Triple> it1 = g1.find(Triple.ANY);
try {
ExtendedIterator<Triple> it2 = g2.find(Triple.ANY);
try {
while (it1.hasNext() && it2.hasNext()) {
it1.next();
it2.next();
}
if (it1.hasNext()) {
return 1;
} else if (it2.hasNext()) {
return -1;
}
return 0;
} finally {
it2.close();
}
} finally {
it1.close();
}
}
{code}
> Performance regression in Model.remove(Model m) method
> ------------------------------------------------------
>
> Key: JENA-1414
> URL: https://issues.apache.org/jira/browse/JENA-1414
> Project: Apache Jena
> Issue Type: Bug
> Components: Core
> Affects Versions: Jena 3.3.0, Jena 3.4.0
> Reporter: Michał Woźniak
> Attachments: graph_util_improve.patch
>
>
> The Model.remove(Model) works very slow on large models, as it propagates to
> GraphUtil.deleteFrom(Graph, Graph), which computes size of the target graph
> by iterating over all triples. This computation takes nearly 100% of the time
> of the Model.remove(Model) operation.
> It seems this commit introduced the issue:
> https://github.com/apache/jena/commit/781895ce64e062c7f2268a78189a777c39b92844#diff-fbb4d11dc804464f94c27e33e11b18e8
> Due to this bug deletion of a concept scheme on a large ontology may take
> several minutes.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)