[
https://issues.apache.org/jira/browse/JENA-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shawn Smith updated JENA-1269:
------------------------------
Description:
Spilling bindings with boolean literals to a DistinctDataBag or SortedDataBag
results in parse errors when the data bag reads the bindings back in. This
occurs with:
{noformat}
"false"^^<http://www.w3.org/2001/XMLSchema#boolean>
"true"^^<http://www.w3.org/2001/XMLSchema#boolean>
{noformat}
It looks like there's a mismatch where booleans don't round trip correctly
through BindingOutputStream and BindingInputStream. BindingOutputStream writes
the boolean literals as to the spill file as "true" or "false", then
BindingInputStream parses them as symbol tokens instead of node tokens and
fails.
Here's a unit test that reproduces the parse error:
{code:java}
import org.apache.jena.atlas.data.*;
import org.apache.jena.datatypes.xsd.XSDDatatype;
import org.apache.jena.graph.*;
import org.apache.jena.riot.system.SerializationFactoryFinder;
import org.apache.jena.sparql.core.Var;
import org.apache.jena.sparql.engine.binding.*;
import org.junit.Assert;
import org.junit.Test;
public class DataBagSpillTest {
@Test
public void testSpillBooleans() {
Node literal = NodeFactory.createLiteral("true", XSDDatatype.XSDboolean);
Binding parent = BindingFactory.binding(Var.alloc("a"),
NodeFactory.createLiteral("xyz"));
Binding binding = BindingFactory.binding(parent, Var.alloc("b"), literal);
// Binding binding = BindingFactory.binding(BindingFactory.noParent,
Var.alloc("b"), literal);
SerializationFactory<Binding> serializationFactory =
SerializationFactoryFinder.bindingSerializationFactory();
SortedDataBag<Binding> dataBag = BagFactory.newSortedBag(new
ThresholdPolicyCount<>(0), serializationFactory, null);
try {
dataBag.add(binding);
dataBag.flush();
// Spill file looks like the following (uses Turtle syntax for literals):
// VARS ?b ?a .
// true "xyz" .
// On reading back the dataBag it throws:
//
// org.apache.jena.riot.RiotException: [line: 2, col: 7 ]
// Not a valid token for an RDF term: [KEYWORD:false]
//
// If the test is modified to leave out the 'parent' binding (uncomment
'noParent' line) it throws:
//
// org.apache.jena.riot.RiotException: [line: 2, col: 6 ]
// Too many items in a line. Expected 1
//
Binding actual = dataBag.iterator().next();
Assert.assertEquals(binding, actual);
} finally {
dataBag.close();
}
}
}
{code}
was:
Spilling bindings with boolean literals to a DistinctDataBag or SortedDataBag
results in parse errors when the data bag reads the bindings back in. This
occurs with:
{noformat}
"false"^^<http://www.w3.org/2001/XMLSchema#boolean>
"true"^^<http://www.w3.org/2001/XMLSchema#boolean>
{noformat}
It looks like there's a mismatch where booleans don't round trip correctly
through BindingOutputStream and BindingInputStream. BindingOutputStream writes
the boolean literals as to the spill file as "true" or "false", then
BindingInputStream parses them as symbol tokens instead of node tokens and
fails.
Here's a unit test that reproduces the parse error:
{code:java}
import org.apache.jena.atlas.data.*;
import org.apache.jena.datatypes.xsd.XSDDatatype;
import org.apache.jena.graph.*;
import org.apache.jena.riot.system.SerializationFactoryFinder;
import org.apache.jena.sparql.core.Var;
import org.apache.jena.sparql.engine.binding.*;
import org.junit.Assert;
import org.junit.Test;
public class JenaSparqlClientTest {
@Test
public void testSpillBooleans() {
Node literal = NodeFactory.createLiteral("true", XSDDatatype.XSDboolean);
Binding parent = BindingFactory.binding(Var.alloc("a"),
NodeFactory.createLiteral("xyz"));
Binding binding = BindingFactory.binding(parent, Var.alloc("b"), literal);
// Binding binding = BindingFactory.binding(BindingFactory.noParent,
Var.alloc("b"), literal);
SerializationFactory<Binding> serializationFactory =
SerializationFactoryFinder.bindingSerializationFactory();
SortedDataBag<Binding> dataBag = BagFactory.newSortedBag(new
ThresholdPolicyCount<>(0), serializationFactory, null);
try {
dataBag.add(binding);
dataBag.flush();
// Spill file looks like the following (uses Turtle syntax for literals):
// VARS ?b ?a .
// true "xyz" .
// On reading back the dataBag it throws:
//
// org.apache.jena.riot.RiotException: [line: 2, col: 7 ] Not a valid
token for an RDF term: [KEYWORD:false]
//
// If the test is modified to leave out the 'parent' binding (uncomment
'noParent' line) it throws:
//
// org.apache.jena.riot.RiotException: [line: 2, col: 6 ] Too many items
in a line. Expected 1
//
Binding actual = dataBag.iterator().next();
Assert.assertEquals(binding, actual);
} finally {
dataBag.close();
}
}
}
{code}
> Spilling a data bag with boolean literals throws a parse exception
> ------------------------------------------------------------------
>
> Key: JENA-1269
> URL: https://issues.apache.org/jira/browse/JENA-1269
> Project: Apache Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: Jena 3.1.1
> Reporter: Shawn Smith
>
> Spilling bindings with boolean literals to a DistinctDataBag or SortedDataBag
> results in parse errors when the data bag reads the bindings back in. This
> occurs with:
> {noformat}
> "false"^^<http://www.w3.org/2001/XMLSchema#boolean>
> "true"^^<http://www.w3.org/2001/XMLSchema#boolean>
> {noformat}
> It looks like there's a mismatch where booleans don't round trip correctly
> through BindingOutputStream and BindingInputStream. BindingOutputStream
> writes the boolean literals as to the spill file as "true" or "false", then
> BindingInputStream parses them as symbol tokens instead of node tokens and
> fails.
> Here's a unit test that reproduces the parse error:
> {code:java}
> import org.apache.jena.atlas.data.*;
> import org.apache.jena.datatypes.xsd.XSDDatatype;
> import org.apache.jena.graph.*;
> import org.apache.jena.riot.system.SerializationFactoryFinder;
> import org.apache.jena.sparql.core.Var;
> import org.apache.jena.sparql.engine.binding.*;
> import org.junit.Assert;
> import org.junit.Test;
> public class DataBagSpillTest {
> @Test
> public void testSpillBooleans() {
> Node literal = NodeFactory.createLiteral("true", XSDDatatype.XSDboolean);
> Binding parent = BindingFactory.binding(Var.alloc("a"),
> NodeFactory.createLiteral("xyz"));
> Binding binding = BindingFactory.binding(parent, Var.alloc("b"), literal);
> // Binding binding = BindingFactory.binding(BindingFactory.noParent,
> Var.alloc("b"), literal);
> SerializationFactory<Binding> serializationFactory =
> SerializationFactoryFinder.bindingSerializationFactory();
> SortedDataBag<Binding> dataBag = BagFactory.newSortedBag(new
> ThresholdPolicyCount<>(0), serializationFactory, null);
> try {
> dataBag.add(binding);
> dataBag.flush();
> // Spill file looks like the following (uses Turtle syntax for
> literals):
> // VARS ?b ?a .
> // true "xyz" .
> // On reading back the dataBag it throws:
> //
> // org.apache.jena.riot.RiotException: [line: 2, col: 7 ]
> // Not a valid token for an RDF term: [KEYWORD:false]
> //
> // If the test is modified to leave out the 'parent' binding (uncomment
> 'noParent' line) it throws:
> //
> // org.apache.jena.riot.RiotException: [line: 2, col: 6 ]
> // Too many items in a line. Expected 1
> //
> Binding actual = dataBag.iterator().next();
> Assert.assertEquals(binding, actual);
> } finally {
> dataBag.close();
> }
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)