[
https://issues.apache.org/jira/browse/JENA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shawn Smith updated JENA-1771:
------------------------------
Description:
It looks like Jena assumes that OpDistinct preserves order, but order is not
preserved when spilling occurs. This is only a problem when the
ARQ.spillToDiskThreshold setting has been configured.
Consider the following query:
{code:java}
PREFIX : <http://example.org/>
SELECT DISTINCT *
WHERE
{ ?x :p ?v }
ORDER BY ASC(?v)
{code}
Here's the query plan for this query:
{code:java}
(distinct
(order ((asc ?v))
(bgp (triple ?x <http://example.org/p> ?v))))
{code}
Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL
requirement:
{quote}The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
{quote}
But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a
DistinctDataBag with a BindingComparator without any sort conditions. As a
result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and
doesn't preserve the ORDER BY ASC(?v) requirement.
Note that some query plans will reorder the ORDER BY and DISTINCT, making
things work correctly. For example, adding a LIMIT clause to the query above
results in a "(top (5 (asc ?v))" operation that doesn't suffer from the bug.
You can reproduce this by injecting the following into QueryTest.java then
running the ARQTestRefEngine tests:
{code:java}
void runTestSelect(Query query, QueryExecution qe)
{
qe.getContext().set(ARQ.spillToDiskThreshold, 4); // add this line
...
{code}
For example, "ARQTestRefEngine -> Algebra optimizations ->
QueryTest.opt-top-05" will fail with:
{code:java}
Query:
PREFIX : <http://example/>
SELECT DISTINCT *
WHERE
{ ?x :p ?v }
ORDER BY ASC(?v)
LIMIT 5
Got: 5 --------------------------------
-------------
| x | v |
=============
| :x1 | 1 |
| :x2 | 2 |
| :x10 | 10 |
| :x11 | 11 |
| :x12 | 12 |
-------------
Expected: 5 -----------------------------
------------
| x | v |
============
| :x1 | 1 |
| :x2 | 2 |
| :x3 | 3 |
| :x3a | 3 |
| :x4 | 4 |
------------
junit.framework.AssertionFailedError: Results do not match
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.TestCase.assertTrue(TestCase.java:192)
at
org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
at
org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
at
org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
{code}
was:
It looks like Jena assumes that OpDistinct preserves order, but order is not
preserved when spilling occurs. This is only a problem when the
ARQ.spillToDiskThreshold setting has been configured.
Consider the following query:
{code:java}
SELECT DISTINCT *
WHERE
{ ?x :p ?v }
ORDER BY ASC(?v)
{code}
Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL
requirement:
bq. The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a
DistinctDataBag with a BindingComparator without any sort conditions. As a
result, the DISTINCT operation doesn't preserve the ORDER BY ASC(?v)
requirement.
You can reproduce this by injecting the following into QueryTest.java then
running the ARQTestRefEngine tests:
{code:java}
void runTestSelect(Query query, QueryExecution qe)
{
qe.getContext().set(ARQ.spillToDiskThreshold, 4); // add this line
...
{code}
For example, "ARQTestRefEngine -> Algebra optimizations ->
QueryTest.opt-top-05" will fail with:
{code}
Query:
PREFIX : <http://example/>
SELECT DISTINCT *
WHERE
{ ?x :p ?v }
ORDER BY ASC(?v)
LIMIT 5
Got: 5 --------------------------------
-------------
| x | v |
=============
| :x1 | 1 |
| :x2 | 2 |
| :x10 | 10 |
| :x11 | 11 |
| :x12 | 12 |
-------------
Expected: 5 -----------------------------
------------
| x | v |
============
| :x1 | 1 |
| :x2 | 2 |
| :x3 | 3 |
| :x3a | 3 |
| :x4 | 4 |
------------
junit.framework.AssertionFailedError: Results do not match
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.TestCase.assertTrue(TestCase.java:192)
at
org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
at
org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
at
org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
{code}
> Spilling combined with DISTINCT .. ORDER BY returns rows in the wrong order
> ---------------------------------------------------------------------------
>
> Key: JENA-1771
> URL: https://issues.apache.org/jira/browse/JENA-1771
> Project: Apache Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: Jena 3.13.1
> Reporter: Shawn Smith
> Priority: Major
>
> It looks like Jena assumes that OpDistinct preserves order, but order is not
> preserved when spilling occurs. This is only a problem when the
> ARQ.spillToDiskThreshold setting has been configured.
> Consider the following query:
> {code:java}
> PREFIX : <http://example.org/>
> SELECT DISTINCT *
> WHERE
> { ?x :p ?v }
> ORDER BY ASC(?v)
> {code}
> Here's the query plan for this query:
> {code:java}
> (distinct
> (order ((asc ?v))
> (bgp (triple ?x <http://example.org/p> ?v))))
> {code}
> Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL
> requirement:
> {quote}The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
> {quote}
> But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a
> DistinctDataBag with a BindingComparator without any sort conditions. As a
> result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and
> doesn't preserve the ORDER BY ASC(?v) requirement.
> Note that some query plans will reorder the ORDER BY and DISTINCT, making
> things work correctly. For example, adding a LIMIT clause to the query above
> results in a "(top (5 (asc ?v))" operation that doesn't suffer from the bug.
> You can reproduce this by injecting the following into QueryTest.java then
> running the ARQTestRefEngine tests:
> {code:java}
> void runTestSelect(Query query, QueryExecution qe)
> {
> qe.getContext().set(ARQ.spillToDiskThreshold, 4); // add this line
> ...
> {code}
> For example, "ARQTestRefEngine -> Algebra optimizations ->
> QueryTest.opt-top-05" will fail with:
> {code:java}
> Query:
> PREFIX : <http://example/>
> SELECT DISTINCT *
> WHERE
> { ?x :p ?v }
> ORDER BY ASC(?v)
> LIMIT 5
> Got: 5 --------------------------------
> -------------
> | x | v |
> =============
> | :x1 | 1 |
> | :x2 | 2 |
> | :x10 | 10 |
> | :x11 | 11 |
> | :x12 | 12 |
> -------------
> Expected: 5 -----------------------------
> ------------
> | x | v |
> ============
> | :x1 | 1 |
> | :x2 | 2 |
> | :x3 | 3 |
> | :x3a | 3 |
> | :x4 | 4 |
> ------------
> junit.framework.AssertionFailedError: Results do not match
> at junit.framework.Assert.fail(Assert.java:57)
> at junit.framework.Assert.assertTrue(Assert.java:22)
> at junit.framework.TestCase.assertTrue(TestCase.java:192)
> at
> org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
> at
> org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
> at
> org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
> at junit.framework.TestCase.runBare(TestCase.java:141)
> at junit.framework.TestResult$1.protect(TestResult.java:122)
> at junit.framework.TestResult.runProtected(TestResult.java:142)
> at junit.framework.TestResult.run(TestResult.java:125)
> at junit.framework.TestCase.run(TestCase.java:129)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)