[
https://issues.apache.org/jira/browse/JENA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365714#comment-16365714
]
Andy Seaborne commented on JENA-1489:
-------------------------------------
First thought:
In RDF, every time the same RDF syntax is read, it will have different blank
nodes.
This can show up if you POST RDF data because POST is "add triples to the
destination". The other operation is HTTP PUT (and "put" in the
{{RDFConnection}} interface). PUT replaces the content.
{{loadDatasetSimple}} uses {{loadDataset}} which is a POST (append).
If it is actually writing twice, there will be two requests in the Fuseki log
file.
> models written twice on RDFConnection
> -------------------------------------
>
> Key: JENA-1489
> URL: https://issues.apache.org/jira/browse/JENA-1489
> Project: Apache Jena
> Issue Type: Bug
> Components: Fuseki, Jena, TDB
> Affects Versions: Jena 3.7.0
> Environment: Jena 3.7.0-Snapshot, Java 1.8.0_131 on Mac OS 10.13.3,
> Java 1.8.0_151-8u151-b12-1~deb9u1-b12 on Debian Stretch
> Reporter: Code Ferret
> Priority: Major
>
> *Problem*: I am transferring models via {{RDFConnection}} to {{TDB}} and
> seeing doubling of blank nodes in _some_ graphs as though the same model is
> written a second time *after* a commit during the transfer. I apologize in
> advance for the length of this report.
> *Details*: We have a collection of entity types: Persons, Items, Works and so
> on. Each entity is a graph in a ttl file in a per type git repo. For each
> type, the ttl files are read from the corresponding repo into models and the
> models are added to a {{Dataset}} until the number of triples in the dataset
> exceeds a threshold, e.g., 50,000 triples. When the threshold is exceeded
> then the dataset is loaded to Fuseki via an RDFConnection:
> {code:java}
> fuConn = RDFConnectionFactory.connect(baseUrl, baseUrl+"/query",
> baseUrl+"/update", baseUrl+"/data");
> {code}
> which is opened once at the beginning of loading all entity types. The kernel
> of loading is performed via:
> {code:java}
> private static void loadDatasetSimple(final Dataset ds) {
> if (!fuConn.isInTransaction()) {
> fuConn.begin(ReadWrite.WRITE);
> }
> fuConn.loadDataset(ds);
> fuConn.commit();
> }
> {code}
> The {{loadDatasetSimple}} is called until all of the entities of a given type
> have been loaded from the corresponding repo. Since there may be some models
> not yet transferred after reading in all of the entities of a given type then
> a finish method is called:
> {code:java}
> static void finishDatasetTransfers() {
> // if map is not empty, transfer the last one
> if (currentDataset != null) {
> loadDatasetSimple(currentDataset);
> }
> }
> {code}
> After loading a given type of entity the next type in a list of types to
> transfer is processed as described above and this is when the problem is
> noticed.
> Once enough models of the next type have been added to the transfer dataset
> and that dataset is transferred via {{loadDatasetSimple}} then _some_ of the
> previously transferred graphs exhibit doubled blank nodes. Here is {{describe
> bdr:P58}} to illustrate the doubling:
> {code:java}
> @prefix : <http://purl.bdrc.io/ontology/core/> .
> @prefix bdr: <http://purl.bdrc.io/resource/> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
> @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix adm: <http://purl.bdrc.io/ontology/admin/> .
> bdr:P58 a :Person ;
> adm:gitRevision "e5e094dd8803f851448aac6ff3a800205ff8ef00" ;
> adm:status bdr:StatusReleased ;
> :hasFather bdr:P4342 ;
> :hasMother bdr:P4343 ;
> :personEvent [ a :PersonOccupiesSeat ;
> :personEventPlace bdr:G227
> ] ;
> :personEvent [ a :PersonOccupiesSeat ;
> :personEventPlace bdr:G227
> ] ;
> :personEvent [ a :PersonBirth ;
> :onOrAbout "1402" ;
> :personEventPlace bdr:G547
> ] ;
> :personEvent [ a :PersonOccupiesSeat ;
> :personEventPlace bdr:G235
> ] ;
> :personEvent [ a :PersonOccupiesSeat ;
> :personEventPlace bdr:G235
> ] ;
> :personEvent [ a :PersonDeath ;
> :onOrAbout "1472"
> ] ;
> :personEvent [ a :PersonDeath ;
> :onOrAbout "1472"
> ] ;
> :personEvent [ a :PersonBirth ;
> :onOrAbout "1402" ;
> :personEventPlace bdr:G547
> ] ;
> :personGender bdr:GenderMale ;
> :personName [ a :PersonPrimaryTitle ;
> rdfs:label "spyan snga blo gros rgyal
> mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonPrimaryTitle ;
> rdfs:label "spyan snga blo gros rgyal
> mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonChineseName ;
> rdfs:label "金厄·洛卓坚赞"@zh
> ] ;
> :personName [ a :PersonTitle ;
> rdfs:label "rgya ma spyan snga ba blo gros rgyal
> mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonPrimaryName ;
> rdfs:label "blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonTitle ;
> rdfs:label "rgya ma spyan snga ba blo gros rgyal
> mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonPrimaryName ;
> rdfs:label "blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonFirstOrdinationName ;
> rdfs:label "blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonChineseName ;
> rdfs:label "金厄·洛卓坚赞"@zh
> ] ;
> :personName [ a :PersonFirstOrdinationName ;
> rdfs:label "blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> skos:prefLabel "blo gros rgyal mtshan/"@bo-x-ewts .
> {code}
> This doubling is completely reproducible and the same graphs exhibit doubling
> on each trial.
> Varying the threshold changes which graphs and how many graphs exhibit
> doubling. If the threshold is set higher, e.g., to 100,000 triples per call
> to {{loadDatasetSimple}} then many more graphs exhibit doubling. If the
> threshold is set lower, say to 20,000 triples, then fewer graphs exhibit
> doubling. If only a single model at-a-time is transferred then there is no
> doubling,
> Also if each type of entity is transferred separately - opening the
> connection, transferring all models of the type, then closing down via:
> {code:java}
> public static void closeConnections() {
> TransferHelpers.logger.info("closeConnections fuConn.commit, end,
> close");
> FusekiHelpers.fuConn.commit();
> FusekiHelpers.fuConn.end();
> FusekiHelpers.fuConn.close();
> }
> {code}
> There is no doubling.
> It appears that models that have already been transferred and committed are
> being written a second time when switching to a new type and upon the first
> transfer via {{loadDatasetSimple}} of the new type.
> I'm hoping there's enough information in this report to identify what sort of
> error in usage of {{RDFConnection}} and/or {{TDB}} would account for this
> behavior. If this appears to be a bug in Jena then I will have to expend more
> effort to create a relatively self-contained test case.
> Here is the relevant portion of the Fuseki configuration:
> {code:java}
> @prefix fuseki: <http://jena.apache.org/fuseki#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix : <http://base/#> .
> @prefix text: <http://jena.apache.org/text#> .
> @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
> [] rdf:type fuseki:Server ;
> fuseki:services (
> :bdrcrw
> ) .
> :bdrcrw rdf:type fuseki:Service ;
> fuseki:name "bdrcrw" ; # name of the dataset in
> the url
> fuseki:serviceQuery "query" ; # SPARQL query service
> fuseki:serviceUpdate "update" ; # SPARQL update service
> fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
> fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store
> protocol (read and write)
> fuseki:dataset :bdrc_text_dataset ;
> .
> :bdrc_text_dataset rdf:type text:TextDataset ;
> text:dataset :dataset_bdrc ;
> text:index :bdrc_lucene_index ;
> .
> :dataset_bdrc rdf:type tdb:DatasetTDB ;
> tdb:location "/etc/fuseki/databases/bdrc" ;
> tdb:unionDefaultGraph true ;
> .
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)