On 02/05/12 07:28, Ziya Akar wrote:
Hi,
I am trying to retrieve all dbpedia triples to extract information
about dataset by query below :
Select * where {?s ?p ?o} LIMIT 1000 OFFSET x.
I increment 1000 value of x in each execution. After a while value of
x variable exceeds decimal bounds.
What is the error message exactly? (from which system?)
Error message is : com.hp.hpl.jena.query.QueryParseException:
Encountered "<DECIMAL>. It means that Long is not enough for retrieve
all DBpedia results.
That's not a complete error. It prints out the offending token.
Please - a complete, minimal example
Complete - all the details (what's the query being parsed?)
Minimal - what the least information needed (smallest offset)
It also good practice to include version information.
A DECIMAL is the token for a number with a DOT in it.
When I tried:
SELECT * { ?s ?p ?o } LIMIT 1000 OFFSET 100000000000
When I tried:
SELECT * { ?s ?p ?o } LIMIT 1000 OFFSET 1000000000000000000000000000
using arq.qparse, I got:
08:49:07 WARN ParserSPARQL11 :: Unexpected throwable:
java.lang.NumberFormatException: For input string:
"1000000000000000000000000000"
but if I try:
qparse 'SELECT * { ?s ?p ?o } LIMIT 1000 OFFSET 10.5'
I get
Encountered " <DECIMAL> "10.5 "" at line 1, column 41.
Was expecting:
<INTEGER> ...
May I suggest that you have put in a number with a "." in it, maybe over
1000, that is formatted using convention that "." is the thousands
separator.
The number is not separated in SPARQL.
A long is 2^63 which is about 10^18 or one million million million (an
English trillion, or more usually one billion billion, or "exa"). The
LOD cloud is not an exatriple of RDF. I wouldn't like to guarantee that
ARQ copes that will above an int as it's not common but even a Java int
is 2 billion, and 2 billion triples and at 1K triples per call, is 2
million calls to the LOD cloud copy.
http://en.wikipedia.org/wiki/Metric_prefix
How can i handle this situation? I want to continue to query.
Thanks.
Ziya
If this error comes from DBpedia, then you'll have to ask them.
Jena ARQ uses a long for the offset and limit - I suppose a BigInteger
might be necessary nowdays -- long ago, long was quite enough!
By the way - why not download the dumps of the database instead? Much
more efficient.
I am analyzing all datasets on LOD cloud to extract information about
datasets. Dbpedia is only one of them. If i download dump files, i
have to query them to extract information too. But i can delete
analyzing triples from dump files and then it works. But i prefer
querying at first.
It will take you a very long time to pull dbpedia over 1000 triples at a
time.
Andy
Ziya