[
https://issues.apache.org/jira/browse/JENA-108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096116#comment-13096116
]
Andy Seaborne edited comment on JENA-108 at 9/2/11 5:21 PM:
------------------------------------------------------------
We can add "reduced" as well as "distinct". Reduced can validly be implemented
as "distinct" and, in this optimization, the more predicable outcome is useful.
It also makes the TopN optimization and the DistinctToReduced commute.
was (Author: andy.seaborne):
We can add "reduced" as well as "distinct". Reduced can validly be
implemented as "distinct" and, in this optimization, the more predicable
outcome is useful.
> Optimise DISTINCT + ORDER BY + LIMIT queries
> --------------------------------------------
>
> Key: JENA-108
> URL: https://issues.apache.org/jira/browse/JENA-108
> Project: Jena
> Issue Type: Improvement
> Components: ARQ
> Reporter: Paolo Castagna
> Assignee: Paolo Castagna
> Priority: Minor
> Labels: optimization, sparql
>
> When users what to find the top 10 most 'something' 'things' in their data
> write queries of this form: SELECT DISTINCT * { ... } ORDER BY ... LIMIT 10
> We can add a specific low level optimization for this type of queries using a
> new QueryIterTopNDistinct iterator.
> This SPARQL query:
> SELECT * { ... } ORDER BY ... LIMIT 10
> has this algebra:
> (slice _ 10
> (order (...)
> (bgp (...))))
> which is now transformed into (see JENA-89):
> (top (10 ...)
> (bgp (...)))
> This query:
> SELECT DISTINCT * { ... } ORDER BY ...
> has this algebra:
> (distinct
> (order (...)
> (bgp (...))))
> which is now transformed into (see JENA-90):
> (reduced
> (order (...)
> (bgp (...))))
> What about:
> SELECT DISTINCT * { ... } ORDER BY ... LIMIT 10 ?
> The algebra expression is:
> (slice _ 10
> (distinct
> (order (?p ?o)
> (bgp (triple ?s ?p ?o)))))
> which is (currently) transformed into:
> (slice _ 10
> (reduced
> (order (?p ?o)
> (bgp (triple ?s ?p ?o)))))
> The optimization in JENA-90 kicks in, but it would be better to use the top
> operator for small limits.
> However, it must be a slightly different top implementation (which accepts
> only distinct bindings).
> As Andy suggested in a comment on JENA-90 (12 August 2011), we can transform
> the algebra into:
> (top (10 ...)
> (distinct
> (bgp (...))))
> and then use a QueryIterTopNDistinct when OpExecutor(OpTop) sees an
> opTop.getSubOp() instanceof OpDistinct.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira