[ 
https://issues.apache.org/jira/browse/JENA-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562768#comment-14562768
 ] 

Andy Seaborne commented on JENA-949:
------------------------------------

Analysis:

The problem is that the return from the "distinct data net" is misused:

{code:title=QueryIterDistinct}
   @Override
    protected boolean isFreshSighting(Binding binding)
    {
        return db.netAdd(binding) ;
    }
{code}

A return of true means definitely new, false covers two cases. While filling 
the first part of the bag, the distinct data net returns false if the item is a 
duplicate. Once it starts spilling, it returns false as an  "unknown" always. 
{{QueryIterDistinct}} does not go back to check the data bag when the input 
iterator closes.  What is more, some results have already been yielded so the 
data bag iterator is the wrong answer.

The effect on {{QueryIterDistinct}} is that it will always skip over items 
added to the spilled data.


> DISTINCT spilling to a data bag leads to wrong answers.
> -------------------------------------------------------
>
>                 Key: JENA-949
>                 URL: https://issues.apache.org/jira/browse/JENA-949
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 3.0.0
>            Reporter: Andy Seaborne
>         Attachments: Jena949_1.java
>
>
> In the attached example, the same query is made twice. The second time 
> {{ARQ.spillToDiskThreshold}} is set to 2L.  The first results are correct.
> [email 
> 2015-05-20|http://mail-archives.apache.org/mod_mbox/jena-users/201505.mbox/%3C34B3B313-EAE4-4498-875F-A9674A8B3B2D%40interition.net%3E]
> reports a possibly similar situation at scale.
> The presence of {{DISTINCT}} is the key factor.
> Output:
> {noformat}
> -----------------------
> | g                   |
> =======================
> | <http://example/g1> |
> | <http://example/g2> |
> | <http://example/g3> |
> | <http://example/g4> |
> | <http://example/g5> |
> | <http://example/g6> |
> | <http://example/g7> |
> | <http://example/g8> |
> | <http://example/g9> |
> | <http://example/g0> |
> -----------------------
> -----------------------
> | g                   |
> =======================
> | <http://example/g1> |
> | <http://example/g2> |
> -----------------------
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to