[
https://issues.apache.org/jira/browse/JENA-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092086#comment-13092086
]
Stephen Allen commented on JENA-90:
-----------------------------------
Hi Paolo,
I think the approach you want is to use QueryIterReduced instead of the new
QueryIterDistinctSort class you propose (also, an important note: [1]).
Perhaps QueryIterReduced could possibly be optimized a little bit by
eliminating the general purpose window array and using a single variable in
this particular case of a sorted input.
Although, in my mind, a better approach would be to modify the algebra as part
of a query optimization step (replace the OpDistinct with an OpReduced) when it
is known that the QueryIterator to which it is applied to is sorted (either
because of an underlying OpOrder or a sorted triple/quad index). This makes it
easier to determine what is going on during a query execution by examining the
transformed algebra instead of having branches in the physical operators
themselves.
[1] DistinctDataBag is not guaranteed to be sorted. The in-memory bindings
are stored in a HashSet, thus if the bag does not spill to disk then no attempt
is made to sort the bindings in the iterator (so as not to perform extra
effort). It would not be hard to create a DistinctSortedDataBag, but I'm not
sure that it is necessary (and IMO limiting the number of primitive operations
helps simplify the system).
> Use OpReduce instead of OpDistinct for DISTINCT + ORDER BY queries
> ------------------------------------------------------------------
>
> Key: JENA-90
> URL: https://issues.apache.org/jira/browse/JENA-90
> Project: Jena
> Issue Type: Improvement
> Components: ARQ
> Reporter: Paolo Castagna
> Assignee: Paolo Castagna
> Priority: Trivial
> Labels: arq, optimizer, sparql
> Attachments: ARQ_JENA-90_r1159636.patch
>
>
> ARQ's optimizer could use an OpReduce instead of OpDistinct if a query is
> DISTINCT + ORDER BY.
> OpReduce removes adjacent duplicates and it does not require a set of already
> seen bindings as the current OpDistinct implementation does.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira