Jon Phillips created JENA-275:
---------------------------------
Summary: different query results for tdbloader and tdbloader3
Key: JENA-275
URL: https://issues.apache.org/jira/browse/JENA-275
Project: Apache Jena
Issue Type: Question
Components: TDB
Affects Versions: TDB 0.9.2
Reporter: Jon Phillips
Priority: Minor
I had intended to use tdbloader3 over tdbloader for loading some large data
sets of (> 100 million triples) because I was seening higher sustained
triples-per-second load rates. However, I am running into some immediate
issues running basic queries on the resulting models, even on small toy test
sets. In one simple case, a SPARQL query with a fixed predicate but unbound
subject (excuse my novice grasp of terminology) and objects fails to return any
results for the model loaded with tdbloader3.
Here is the sequence of steps that I ran:
cat dbpedia.nt (list of 10 triples from dbpedia)
<http://dbpedia.org/resource/AccessibleComputing>
<http://www.w3.org/2000/01/rdf-schema#label> "AccessibleComputing"@en .
<http://dbpedia.org/resource/AfghanistanGeography>
<http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanGeography"@en .
<http://dbpedia.org/resource/AfghanistanHistory>
<http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanHistory"@en .
<http://dbpedia.org/resource/AfghanistanPeople>
<http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanPeople"@en .
<http://dbpedia.org/resource/AfghanistanCommunications>
<http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanCommunications"@en .
<http://dbpedia.org/resource/AfghanistanTransportations>
<http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanTransportations"@en .
<http://dbpedia.org/resource/AfghanistanMilitary>
<http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanMilitary"@en .
<http://dbpedia.org/resource/AfghanistanTransnationalIssues>
<http://www.w3.org/2000/01/rdf-schema#label>
"AfghanistanTransnationalIssues"@en .
<http://dbpedia.org/resource/AmoeboidTaxa>
<http://www.w3.org/2000/01/rdf-schema#label> "AmoeboidTaxa"@en .
build the model with tdbloader
tdbloader --loc=dbpedia_tdbl1 dbpedia.nt
23:18:29 INFO loader :: -- Start triples data phase
23:18:29 INFO loader :: ** Load empty triples table
23:18:29 INFO loader :: Load: dbpedia.nt -- 2012/07/11 23:18:29
EDT
23:18:29 INFO loader :: -- Finish triples data phase
23:18:29 INFO loader :: 9 triples loaded in 0.04 seconds [Rate:
214.29 per second]
23:18:29 INFO loader :: -- Start triples index phase
23:18:29 INFO loader :: ** Index SPO->POS: 9 slots indexed in
0.00 seconds [Rate: 9,000.00 per second]
23:18:29 INFO loader :: ** Index SPO->OSP: 9 slots indexed in
0.00 seconds [Rate: 9,000.00 per second]
23:18:29 INFO loader :: -- Finish triples index phase
23:18:29 INFO loader :: ** 9 triples indexed in 0.00 seconds
[Rate: 1,800.00 per second]
23:18:29 INFO loader :: -- Finish triples load
23:18:29 INFO loader :: ** Completed: 9 triples loaded in 0.05
seconds [Rate: 163.64 per second]
now build the same model with tdbloader3
tdbloader3 --loc=dbpedia_tdbl3 dbpedia.nt
23:18:38 INFO tdbloader3 :: Load: dbpedia.nt -- 2012/07/11 23:18:38
EDT
23:18:38 INFO tdbloader3 :: Node Table (1/3): building nodes.dat and
sorting hash|id ...
23:18:38 INFO tdbloader3 :: Total: 27 tuples : 0.01 seconds :
1,928.57 tuples/sec [2012/07/11 23:18:38 EDT]
23:18:38 INFO tdbloader3 :: Node Table (2/3): generating input data
using node ids...
23:18:38 INFO tdbloader3 :: Total: 8 tuples : 0.03 seconds : 275.86
tuples/sec [2012/07/11 23:18:38 EDT]
23:18:38 INFO tdbloader3 :: Node Table (3/3): building node table
B+Tree index (i.e. node2id.dat and node2id.idn files)...
23:18:39 INFO tdbloader3 :: Total: 19 tuples : 0.08 seconds : 234.57
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: creating SPO index...
23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.01 seconds :
1,500.00 tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: creating GSPO index...
23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: sorting data for POS index...
23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.00 seconds :
4,500.00 tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: creating POS index...
23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.01 seconds :
1,125.00 tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: sorting data for OSP index...
23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.00 seconds : 0.00
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: creating OSP index...
23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.00 seconds :
1,800.00 tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: sorting data for GPOS index...
23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: creating GPOS index...
23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: sorting data for GOSP index...
23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: creating GOSP index...
23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: sorting data for POSG index...
23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: creating POSG index...
23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: sorting data for OSPG index...
23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: creating OSPG index...
23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: sorting data for SPOG index...
23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Index: creating SPOG index...
23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.45 seconds : 20.18
tuples/sec [2012/07/11 23:18:39 EDT]
two simple queries that return the entire result set return the same set of
triples:
./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x ?y ?z }"
-----------------------------------------------------------------------------------------------------------------------------------------------------
| x | y
| z |
=====================================================================================================================================================
| <http://dbpedia.org/resource/AccessibleComputing> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en
|
| <http://dbpedia.org/resource/AfghanistanGeography> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en
|
| <http://dbpedia.org/resource/AfghanistanHistory> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en
|
| <http://dbpedia.org/resource/AfghanistanPeople> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en
|
| <http://dbpedia.org/resource/AfghanistanCommunications> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en
|
| <http://dbpedia.org/resource/AfghanistanTransportations> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanTransportations"@en
|
| <http://dbpedia.org/resource/AfghanistanMilitary> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en
|
| <http://dbpedia.org/resource/AfghanistanTransnationalIssues> |
<http://www.w3.org/2000/01/rdf-schema#label> |
"AfghanistanTransnationalIssues"@en |
| <http://dbpedia.org/resource/AmoeboidTaxa> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en
|
-----------------------------------------------------------------------------------------------------------------------------------------------------
same result for the model built with tdbloader3
./tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x ?y ?z }"
-----------------------------------------------------------------------------------------------------------------------------------------------------
| x | y
| z |
=====================================================================================================================================================
| <http://dbpedia.org/resource/AccessibleComputing> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en
|
| <http://dbpedia.org/resource/AfghanistanCommunications> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en
|
| <http://dbpedia.org/resource/AfghanistanGeography> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en
|
| <http://dbpedia.org/resource/AfghanistanHistory> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en
|
| <http://dbpedia.org/resource/AfghanistanMilitary> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en
|
| <http://dbpedia.org/resource/AfghanistanPeople> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en
|
| <http://dbpedia.org/resource/AfghanistanTransnationalIssues> |
<http://www.w3.org/2000/01/rdf-schema#label> |
"AfghanistanTransnationalIssues"@en |
| <http://dbpedia.org/resource/AfghanistanTransportations> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanTransportations"@en
|
| <http://dbpedia.org/resource/AmoeboidTaxa> |
<http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en
|
-----------------------------------------------------------------------------------------------------------------------------------------------------
different query run on model build with tdbloader that matches on the predicate
type:
./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x
<http://www.w3.org/2000/01/rdf-schema#label> ?z }"
----------------------------------------------------------------------------------------------------------
| x | y | z
|
==========================================================================================================
| <http://dbpedia.org/resource/AccessibleComputing> | |
"AccessibleComputing"@en |
| <http://dbpedia.org/resource/AfghanistanGeography> | |
"AfghanistanGeography"@en |
| <http://dbpedia.org/resource/AfghanistanHistory> | |
"AfghanistanHistory"@en |
| <http://dbpedia.org/resource/AfghanistanPeople> | |
"AfghanistanPeople"@en |
| <http://dbpedia.org/resource/AfghanistanCommunications> | |
"AfghanistanCommunications"@en |
| <http://dbpedia.org/resource/AfghanistanTransportations> | |
"AfghanistanTransportations"@en |
| <http://dbpedia.org/resource/AfghanistanMilitary> | |
"AfghanistanMilitary"@en |
| <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | |
"AfghanistanTransnationalIssues"@en |
| <http://dbpedia.org/resource/AmoeboidTaxa> | |
"AmoeboidTaxa"@en |
----------------------------------------------------------------------------------------------------------
Expected that the data loaded with tdbloader3 to return the same result but
returned empty result:
tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x
<http://www.w3.org/2000/01/rdf-schema#label> ?z }"
-------------
| x | y | z |
=============
-------------
Any help would be much appreciated.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira