[
https://issues.apache.org/jira/browse/JENA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368332#comment-16368332
]
Code Ferret edited comment on JENA-1489 at 2/17/18 11:38 PM:
-------------------------------------------------------------
The results that I am seeing are consistent with
{code:java}
loadDataset(data)
loadDataset(data) // Same data
{code}
on _only some of the graphs_ being sent. It is the same set of graphs for each
run for a given dataset. The set of graphs is dependent on the size of the
dataset being sent. The larger the dataset the more duplication.
in our application we have no dependence on the blank node ids.
I'm not familiar with the _ready-to-go_ idea so I'm not sure how it may apply
to this situation.
was (Author: code-ferret):
The results that I am seeing are consistent with
{code:java}
loadDataset(data)
loadDataset(data) // Same data
{code}
on _only some of the graphs_ being sent. It is the same set of graphs for each
run for a given dataset. The set of graphs is dependent on the size of the
dataset being sent. The larger the dataset the more duplication.
in our application we have no dependence on the blank node ids.
I'm not familiar with the _read-to-go_ idea so I'm not sure how it may apply to
this situation.
> models written twice on RDFConnection
> -------------------------------------
>
> Key: JENA-1489
> URL: https://issues.apache.org/jira/browse/JENA-1489
> Project: Apache Jena
> Issue Type: Bug
> Components: Fuseki, Jena, TDB
> Affects Versions: Jena 3.7.0
> Environment: Jena 3.7.0-Snapshot, Java 1.8.0_131 on Mac OS 10.13.3,
> Java 1.8.0_151-8u151-b12-1~deb9u1-b12 on Debian Stretch
> Reporter: Code Ferret
> Priority: Major
>
> *Problem*: I am transferring models via {{RDFConnection}} to {{TDB}} and
> seeing doubling of blank nodes in _some_ graphs as though the same model is
> written a second time *after* a commit during the transfer. I apologize in
> advance for the length of this report.
> *Details*: We have a collection of entity types: Persons, Items, Works and so
> on. Each entity is a graph in a ttl file in a per type git repo. For each
> type, the ttl files are read from the corresponding repo into models and the
> models are added to a {{Dataset}} until the number of triples in the dataset
> exceeds a threshold, e.g., 50,000 triples. When the threshold is exceeded
> then the dataset is loaded to Fuseki via an RDFConnection:
> {code:java}
> fuConn = RDFConnectionFactory.connect(baseUrl, baseUrl+"/query",
> baseUrl+"/update", baseUrl+"/data");
> {code}
> which is opened once at the beginning of loading all entity types. The kernel
> of loading is performed via:
> {code:java}
> private static void loadDatasetSimple(final Dataset ds) {
> if (!fuConn.isInTransaction()) {
> fuConn.begin(ReadWrite.WRITE);
> }
> fuConn.loadDataset(ds);
> fuConn.commit();
> }
> {code}
> The {{loadDatasetSimple}} is called until all of the entities of a given type
> have been loaded from the corresponding repo. Since there may be some models
> not yet transferred after reading in all of the entities of a given type then
> a finish method is called:
> {code:java}
> static void finishDatasetTransfers() {
> // if map is not empty, transfer the last one
> if (currentDataset != null) {
> loadDatasetSimple(currentDataset);
> }
> }
> {code}
> After loading a given type of entity the next type in a list of types to
> transfer is processed as described above and this is when the problem is
> noticed.
> Once enough models of the next type have been added to the transfer dataset
> and that dataset is transferred via {{loadDatasetSimple}} then _some_ of the
> previously transferred graphs exhibit doubled blank nodes. Here is {{describe
> bdr:P58}} to illustrate the doubling:
> {code:java}
> @prefix : <http://purl.bdrc.io/ontology/core/> .
> @prefix bdr: <http://purl.bdrc.io/resource/> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
> @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix adm: <http://purl.bdrc.io/ontology/admin/> .
> bdr:P58 a :Person ;
> adm:gitRevision "e5e094dd8803f851448aac6ff3a800205ff8ef00" ;
> adm:status bdr:StatusReleased ;
> :hasFather bdr:P4342 ;
> :hasMother bdr:P4343 ;
> :personEvent [ a :PersonOccupiesSeat ;
> :personEventPlace bdr:G227
> ] ;
> :personEvent [ a :PersonOccupiesSeat ;
> :personEventPlace bdr:G227
> ] ;
> :personEvent [ a :PersonBirth ;
> :onOrAbout "1402" ;
> :personEventPlace bdr:G547
> ] ;
> :personEvent [ a :PersonOccupiesSeat ;
> :personEventPlace bdr:G235
> ] ;
> :personEvent [ a :PersonOccupiesSeat ;
> :personEventPlace bdr:G235
> ] ;
> :personEvent [ a :PersonDeath ;
> :onOrAbout "1472"
> ] ;
> :personEvent [ a :PersonDeath ;
> :onOrAbout "1472"
> ] ;
> :personEvent [ a :PersonBirth ;
> :onOrAbout "1402" ;
> :personEventPlace bdr:G547
> ] ;
> :personGender bdr:GenderMale ;
> :personName [ a :PersonPrimaryTitle ;
> rdfs:label "spyan snga blo gros rgyal
> mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonPrimaryTitle ;
> rdfs:label "spyan snga blo gros rgyal
> mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonChineseName ;
> rdfs:label "金厄·洛卓坚赞"@zh
> ] ;
> :personName [ a :PersonTitle ;
> rdfs:label "rgya ma spyan snga ba blo gros rgyal
> mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonPrimaryName ;
> rdfs:label "blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonTitle ;
> rdfs:label "rgya ma spyan snga ba blo gros rgyal
> mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonPrimaryName ;
> rdfs:label "blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonFirstOrdinationName ;
> rdfs:label "blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonChineseName ;
> rdfs:label "金厄·洛卓坚赞"@zh
> ] ;
> :personName [ a :PersonFirstOrdinationName ;
> rdfs:label "blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> skos:prefLabel "blo gros rgyal mtshan/"@bo-x-ewts .
> {code}
> This doubling is completely reproducible and the same graphs exhibit doubling
> on each trial.
> Varying the threshold changes which graphs and how many graphs exhibit
> doubling. If the threshold is set higher, e.g., to 100,000 triples per call
> to {{loadDatasetSimple}} then many more graphs exhibit doubling. If the
> threshold is set lower, say to 20,000 triples, then fewer graphs exhibit
> doubling. If only a single model at-a-time is transferred then there is no
> doubling,
> Also if each type of entity is transferred separately - opening the
> connection, transferring all models of the type, then closing down via:
> {code:java}
> public static void closeConnections() {
> TransferHelpers.logger.info("closeConnections fuConn.commit, end,
> close");
> FusekiHelpers.fuConn.commit();
> FusekiHelpers.fuConn.end();
> FusekiHelpers.fuConn.close();
> }
> {code}
> There is no doubling.
> It appears that models that have already been transferred and committed are
> being written a second time when switching to a new type and upon the first
> transfer via {{loadDatasetSimple}} of the new type.
> I'm hoping there's enough information in this report to identify what sort of
> error in usage of {{RDFConnection}} and/or {{TDB}} would account for this
> behavior. If this appears to be a bug in Jena then I will have to expend more
> effort to create a relatively self-contained test case.
> Here is the relevant portion of the Fuseki configuration:
> {code:java}
> @prefix fuseki: <http://jena.apache.org/fuseki#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix : <http://base/#> .
> @prefix text: <http://jena.apache.org/text#> .
> @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
> [] rdf:type fuseki:Server ;
> fuseki:services (
> :bdrcrw
> ) .
> :bdrcrw rdf:type fuseki:Service ;
> fuseki:name "bdrcrw" ; # name of the dataset in
> the url
> fuseki:serviceQuery "query" ; # SPARQL query service
> fuseki:serviceUpdate "update" ; # SPARQL update service
> fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
> fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store
> protocol (read and write)
> fuseki:dataset :bdrc_text_dataset ;
> .
> :bdrc_text_dataset rdf:type text:TextDataset ;
> text:dataset :dataset_bdrc ;
> text:index :bdrc_lucene_index ;
> .
> :dataset_bdrc rdf:type tdb:DatasetTDB ;
> tdb:location "/etc/fuseki/databases/bdrc" ;
> tdb:unionDefaultGraph true ;
> .
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)