[ 
https://issues.apache.org/jira/browse/JENA-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Smith updated JENA-1269:
------------------------------
    Description: 
Spilling bindings with boolean literals to a DistinctDataBag or SortedDataBag 
results in parse errors when the data bag reads the bindings back in.  This 
occurs with:

{noformat}
"false"^^<http://www.w3.org/2001/XMLSchema#boolean>
"true"^^<http://www.w3.org/2001/XMLSchema#boolean>
{noformat}

It looks like there's a mismatch where booleans don't round trip correctly 
through BindingOutputStream and BindingInputStream.  BindingOutputStream writes 
the boolean literals as to the spill file as "true" or "false", then 
BindingInputStream parses them as symbol tokens instead of node tokens and 
fails.

Here's a unit test that reproduces the parse error:

{code:java}
import org.apache.jena.atlas.data.*;
import org.apache.jena.datatypes.xsd.XSDDatatype;
import org.apache.jena.graph.*;
import org.apache.jena.riot.system.SerializationFactoryFinder;
import org.apache.jena.sparql.core.Var;
import org.apache.jena.sparql.engine.binding.*;
import org.junit.Assert;
import org.junit.Test;

public class DataBagSpillTest {
  @Test
  public void testSpillBooleans() {
    Node literal = NodeFactory.createLiteral("true", XSDDatatype.XSDboolean);

    Binding parent = BindingFactory.binding(Var.alloc("a"), 
NodeFactory.createLiteral("xyz"));
    Binding binding = BindingFactory.binding(parent, Var.alloc("b"), literal);
//    Binding binding = BindingFactory.binding(BindingFactory.noParent, 
Var.alloc("b"), literal);

    SerializationFactory<Binding> serializationFactory = 
SerializationFactoryFinder.bindingSerializationFactory();
    SortedDataBag<Binding> dataBag = BagFactory.newSortedBag(new 
ThresholdPolicyCount<>(0), serializationFactory, null);
    try {
      dataBag.add(binding);
      dataBag.flush();

      // Spill file looks like the following (uses Turtle syntax for literals):
      // VARS ?b ?a .
      // true "xyz" .

      // On reading back the dataBag it throws:
      //
      //  org.apache.jena.riot.RiotException: [line: 2, col: 7 ]
      //    Not a valid token for an RDF term: [KEYWORD:false]
      //
      // If the test is modified to leave out the 'parent' binding (uncomment 
'noParent' line) it throws:
      //
      //  org.apache.jena.riot.RiotException: [line: 2, col: 6 ]
      //    Too many items in a line.  Expected 1
      //

      Binding actual = dataBag.iterator().next();
      Assert.assertEquals(binding, actual);
    } finally {
      dataBag.close();
    }
  }
}
{code}


  was:
Spilling bindings with boolean literals to a DistinctDataBag or SortedDataBag 
results in parse errors when the data bag reads the bindings back in.  This 
occurs with:

{noformat}
"false"^^<http://www.w3.org/2001/XMLSchema#boolean>
"true"^^<http://www.w3.org/2001/XMLSchema#boolean>
{noformat}

It looks like there's a mismatch where booleans don't round trip correctly 
through BindingOutputStream and BindingInputStream.  BindingOutputStream writes 
the boolean literals as to the spill file as "true" or "false", then 
BindingInputStream parses them as symbol tokens instead of node tokens and 
fails.

Here's a unit test that reproduces the parse error:

{code:java}
import org.apache.jena.atlas.data.*;
import org.apache.jena.datatypes.xsd.XSDDatatype;
import org.apache.jena.graph.*;
import org.apache.jena.riot.system.SerializationFactoryFinder;
import org.apache.jena.sparql.core.Var;
import org.apache.jena.sparql.engine.binding.*;
import org.junit.Assert;
import org.junit.Test;

public class JenaSparqlClientTest {
  @Test
  public void testSpillBooleans() {
    Node literal = NodeFactory.createLiteral("true", XSDDatatype.XSDboolean);

    Binding parent = BindingFactory.binding(Var.alloc("a"), 
NodeFactory.createLiteral("xyz"));
    Binding binding = BindingFactory.binding(parent, Var.alloc("b"), literal);
//    Binding binding = BindingFactory.binding(BindingFactory.noParent, 
Var.alloc("b"), literal);

    SerializationFactory<Binding> serializationFactory = 
SerializationFactoryFinder.bindingSerializationFactory();
    SortedDataBag<Binding> dataBag = BagFactory.newSortedBag(new 
ThresholdPolicyCount<>(0), serializationFactory, null);
    try {
      dataBag.add(binding);
      dataBag.flush();

      // Spill file looks like the following (uses Turtle syntax for literals):
      // VARS ?b ?a .
      // true "xyz" .

      // On reading back the dataBag it throws:
      //
      //  org.apache.jena.riot.RiotException: [line: 2, col: 7 ] Not a valid 
token for an RDF term: [KEYWORD:false]
      //
      // If the test is modified to leave out the 'parent' binding (uncomment 
'noParent' line) it throws:
      //
      //  org.apache.jena.riot.RiotException: [line: 2, col: 6 ] Too many items 
in a line.  Expected 1
      //

      Binding actual = dataBag.iterator().next();
      Assert.assertEquals(binding, actual);
    } finally {
      dataBag.close();
    }
  }
}
{code}



> Spilling a data bag with boolean literals throws a parse exception
> ------------------------------------------------------------------
>
>                 Key: JENA-1269
>                 URL: https://issues.apache.org/jira/browse/JENA-1269
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 3.1.1
>            Reporter: Shawn Smith
>
> Spilling bindings with boolean literals to a DistinctDataBag or SortedDataBag 
> results in parse errors when the data bag reads the bindings back in.  This 
> occurs with:
> {noformat}
> "false"^^<http://www.w3.org/2001/XMLSchema#boolean>
> "true"^^<http://www.w3.org/2001/XMLSchema#boolean>
> {noformat}
> It looks like there's a mismatch where booleans don't round trip correctly 
> through BindingOutputStream and BindingInputStream.  BindingOutputStream 
> writes the boolean literals as to the spill file as "true" or "false", then 
> BindingInputStream parses them as symbol tokens instead of node tokens and 
> fails.
> Here's a unit test that reproduces the parse error:
> {code:java}
> import org.apache.jena.atlas.data.*;
> import org.apache.jena.datatypes.xsd.XSDDatatype;
> import org.apache.jena.graph.*;
> import org.apache.jena.riot.system.SerializationFactoryFinder;
> import org.apache.jena.sparql.core.Var;
> import org.apache.jena.sparql.engine.binding.*;
> import org.junit.Assert;
> import org.junit.Test;
> public class DataBagSpillTest {
>   @Test
>   public void testSpillBooleans() {
>     Node literal = NodeFactory.createLiteral("true", XSDDatatype.XSDboolean);
>     Binding parent = BindingFactory.binding(Var.alloc("a"), 
> NodeFactory.createLiteral("xyz"));
>     Binding binding = BindingFactory.binding(parent, Var.alloc("b"), literal);
> //    Binding binding = BindingFactory.binding(BindingFactory.noParent, 
> Var.alloc("b"), literal);
>     SerializationFactory<Binding> serializationFactory = 
> SerializationFactoryFinder.bindingSerializationFactory();
>     SortedDataBag<Binding> dataBag = BagFactory.newSortedBag(new 
> ThresholdPolicyCount<>(0), serializationFactory, null);
>     try {
>       dataBag.add(binding);
>       dataBag.flush();
>       // Spill file looks like the following (uses Turtle syntax for 
> literals):
>       // VARS ?b ?a .
>       // true "xyz" .
>       // On reading back the dataBag it throws:
>       //
>       //  org.apache.jena.riot.RiotException: [line: 2, col: 7 ]
>       //    Not a valid token for an RDF term: [KEYWORD:false]
>       //
>       // If the test is modified to leave out the 'parent' binding (uncomment 
> 'noParent' line) it throws:
>       //
>       //  org.apache.jena.riot.RiotException: [line: 2, col: 6 ]
>       //    Too many items in a line.  Expected 1
>       //
>       Binding actual = dataBag.iterator().next();
>       Assert.assertEquals(binding, actual);
>     } finally {
>       dataBag.close();
>     }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to