[
https://issues.apache.org/jira/browse/JENA-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andy Seaborne closed JENA-275.
------------------------------
> different query results for tdbloader and tdbloader3
> ----------------------------------------------------
>
> Key: JENA-275
> URL: https://issues.apache.org/jira/browse/JENA-275
> Project: Apache Jena
> Issue Type: Question
> Components: TDB
> Affects Versions: TDB 0.9.2
> Reporter: Jon Phillips
> Assignee: Andy Seaborne
>
> I had intended to use tdbloader3 over tdbloader for loading some large data
> sets of (> 100 million triples) because I was seening higher sustained
> triples-per-second load rates. However, I am running into some immediate
> issues running basic queries on the resulting models, even on small toy test
> sets. In one simple case, a SPARQL query with a fixed predicate but unbound
> subject (excuse my novice grasp of terminology) and objects fails to return
> any results for the model loaded with tdbloader3.
> Here is the sequence of steps that I ran:
> cat dbpedia.nt (list of 10 triples from dbpedia)
> <http://dbpedia.org/resource/AccessibleComputing>
> <http://www.w3.org/2000/01/rdf-schema#label> "AccessibleComputing"@en .
> <http://dbpedia.org/resource/AfghanistanGeography>
> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanGeography"@en .
> <http://dbpedia.org/resource/AfghanistanHistory>
> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanHistory"@en .
> <http://dbpedia.org/resource/AfghanistanPeople>
> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanPeople"@en .
> <http://dbpedia.org/resource/AfghanistanCommunications>
> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanCommunications"@en .
> <http://dbpedia.org/resource/AfghanistanTransportations>
> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanTransportations"@en .
> <http://dbpedia.org/resource/AfghanistanMilitary>
> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanMilitary"@en .
> <http://dbpedia.org/resource/AfghanistanTransnationalIssues>
> <http://www.w3.org/2000/01/rdf-schema#label>
> "AfghanistanTransnationalIssues"@en .
> <http://dbpedia.org/resource/AmoeboidTaxa>
> <http://www.w3.org/2000/01/rdf-schema#label> "AmoeboidTaxa"@en .
> build the model with tdbloader
> tdbloader --loc=dbpedia_tdbl1 dbpedia.nt
> 23:18:29 INFO loader :: -- Start triples data phase
> 23:18:29 INFO loader :: ** Load empty triples table
> 23:18:29 INFO loader :: Load: dbpedia.nt -- 2012/07/11
> 23:18:29 EDT
> 23:18:29 INFO loader :: -- Finish triples data phase
> 23:18:29 INFO loader :: 9 triples loaded in 0.04 seconds
> [Rate: 214.29 per second]
> 23:18:29 INFO loader :: -- Start triples index phase
> 23:18:29 INFO loader :: ** Index SPO->POS: 9 slots indexed in
> 0.00 seconds [Rate: 9,000.00 per second]
> 23:18:29 INFO loader :: ** Index SPO->OSP: 9 slots indexed in
> 0.00 seconds [Rate: 9,000.00 per second]
> 23:18:29 INFO loader :: -- Finish triples index phase
> 23:18:29 INFO loader :: ** 9 triples indexed in 0.00 seconds
> [Rate: 1,800.00 per second]
> 23:18:29 INFO loader :: -- Finish triples load
> 23:18:29 INFO loader :: ** Completed: 9 triples loaded in 0.05
> seconds [Rate: 163.64 per second]
> now build the same model with tdbloader3
> tdbloader3 --loc=dbpedia_tdbl3 dbpedia.nt
> 23:18:38 INFO tdbloader3 :: Load: dbpedia.nt -- 2012/07/11
> 23:18:38 EDT
> 23:18:38 INFO tdbloader3 :: Node Table (1/3): building nodes.dat
> and sorting hash|id ...
> 23:18:38 INFO tdbloader3 :: Total: 27 tuples : 0.01 seconds :
> 1,928.57 tuples/sec [2012/07/11 23:18:38 EDT]
> 23:18:38 INFO tdbloader3 :: Node Table (2/3): generating input
> data using node ids...
> 23:18:38 INFO tdbloader3 :: Total: 8 tuples : 0.03 seconds :
> 275.86 tuples/sec [2012/07/11 23:18:38 EDT]
> 23:18:38 INFO tdbloader3 :: Node Table (3/3): building node table
> B+Tree index (i.e. node2id.dat and node2id.idn files)...
> 23:18:39 INFO tdbloader3 :: Total: 19 tuples : 0.08 seconds :
> 234.57 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: creating SPO index...
> 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.01 seconds :
> 1,500.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: creating GSPO index...
> 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: sorting data for POS index...
> 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.00 seconds :
> 4,500.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: creating POS index...
> 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.01 seconds :
> 1,125.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: sorting data for OSP index...
> 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.00 seconds : 0.00
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: creating OSP index...
> 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.00 seconds :
> 1,800.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: sorting data for GPOS index...
> 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: creating GPOS index...
> 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: sorting data for GOSP index...
> 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: creating GOSP index...
> 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: sorting data for POSG index...
> 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: creating POSG index...
> 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: sorting data for OSPG index...
> 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: creating OSPG index...
> 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: sorting data for SPOG index...
> 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Index: creating SPOG index...
> 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.45 seconds : 20.18
> tuples/sec [2012/07/11 23:18:39 EDT]
> two simple queries that return the entire result set return the same set of
> triples:
> ./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x ?y ?z }"
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> | x | y
> | z |
> =====================================================================================================================================================
> | <http://dbpedia.org/resource/AccessibleComputing> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en
> |
> | <http://dbpedia.org/resource/AfghanistanGeography> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en
> |
> | <http://dbpedia.org/resource/AfghanistanHistory> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en
> |
> | <http://dbpedia.org/resource/AfghanistanPeople> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en
> |
> | <http://dbpedia.org/resource/AfghanistanCommunications> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en
> |
> | <http://dbpedia.org/resource/AfghanistanTransportations> |
> <http://www.w3.org/2000/01/rdf-schema#label> |
> "AfghanistanTransportations"@en |
> | <http://dbpedia.org/resource/AfghanistanMilitary> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en
> |
> | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> |
> <http://www.w3.org/2000/01/rdf-schema#label> |
> "AfghanistanTransnationalIssues"@en |
> | <http://dbpedia.org/resource/AmoeboidTaxa> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en
> |
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> same result for the model built with tdbloader3
> ./tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x ?y ?z }"
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> | x | y
> | z |
> =====================================================================================================================================================
> | <http://dbpedia.org/resource/AccessibleComputing> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en
> |
> | <http://dbpedia.org/resource/AfghanistanCommunications> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en
> |
> | <http://dbpedia.org/resource/AfghanistanGeography> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en
> |
> | <http://dbpedia.org/resource/AfghanistanHistory> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en
> |
> | <http://dbpedia.org/resource/AfghanistanMilitary> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en
> |
> | <http://dbpedia.org/resource/AfghanistanPeople> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en
> |
> | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> |
> <http://www.w3.org/2000/01/rdf-schema#label> |
> "AfghanistanTransnationalIssues"@en |
> | <http://dbpedia.org/resource/AfghanistanTransportations> |
> <http://www.w3.org/2000/01/rdf-schema#label> |
> "AfghanistanTransportations"@en |
> | <http://dbpedia.org/resource/AmoeboidTaxa> |
> <http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en
> |
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> different query run on model build with tdbloader that matches on the
> predicate type:
> ./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x
> <http://www.w3.org/2000/01/rdf-schema#label> ?z }"
> ----------------------------------------------------------------------------------------------------------
> | x | y | z
> |
> ==========================================================================================================
> | <http://dbpedia.org/resource/AccessibleComputing> | |
> "AccessibleComputing"@en |
> | <http://dbpedia.org/resource/AfghanistanGeography> | |
> "AfghanistanGeography"@en |
> | <http://dbpedia.org/resource/AfghanistanHistory> | |
> "AfghanistanHistory"@en |
> | <http://dbpedia.org/resource/AfghanistanPeople> | |
> "AfghanistanPeople"@en |
> | <http://dbpedia.org/resource/AfghanistanCommunications> | |
> "AfghanistanCommunications"@en |
> | <http://dbpedia.org/resource/AfghanistanTransportations> | |
> "AfghanistanTransportations"@en |
> | <http://dbpedia.org/resource/AfghanistanMilitary> | |
> "AfghanistanMilitary"@en |
> | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | |
> "AfghanistanTransnationalIssues"@en |
> | <http://dbpedia.org/resource/AmoeboidTaxa> | |
> "AmoeboidTaxa"@en |
> ----------------------------------------------------------------------------------------------------------
> Expected that the data loaded with tdbloader3 to return the same result but
> returned empty result:
> tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x
> <http://www.w3.org/2000/01/rdf-schema#label> ?z }"
> -------------
> | x | y | z |
> =============
> -------------
> Any help would be much appreciated.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira