On 02/05/12 07:28, Ziya Akar wrote: >>> Hi, >>>> >>>> I am trying to retrieve all dbpedia triples to extract information >>>> about dataset by query below : >>>> >>>> Select * where {?s ?p ?o} LIMIT 1000 OFFSET x. >>>> >>>> I increment 1000 value of x in each execution. After a while value of >>>> x variable exceeds decimal bounds. >> >>> What is the error message exactly? (from which system?) >> >> Error message is : com.hp.hpl.jena.query.QueryParseException: >> Encountered "<DECIMAL>. It means that Long is not enough for retrieve >> all DBpedia results.
>That's not a complete error. It prints out the offending token. >Please - a complete, minimal example >Complete - all the details (what's the query being parsed?) >Minimal - what the least information needed (smallest offset) >It also good practice to include version information. >A DECIMAL is the token for a number with a DOT in it. >When I tried: >SELECT * { ?s ?p ?o } LIMIT 1000 OFFSET 100000000000 >When I tried: >SELECT * { ?s ?p ?o } LIMIT 1000 OFFSET 1000000000000000000000000000 >using arq.qparse, I got: >08:49:07 WARN ParserSPARQL11 :: Unexpected throwable: >java.lang.NumberFormatException: For input string: >"1000000000000000000000000000" >but if I try: >qparse 'SELECT * { ?s ?p ?o } LIMIT 1000 OFFSET 10.5' >I get >Encountered " <DECIMAL> "10.5 "" at line 1, column 41. >Was expecting: > <INTEGER> ... The reason of my exception is setting decimal number i think. Because i use a decimal variable to set OFFSET. Now it worked. It was my mistake. >May I suggest that you have put in a number with a "." in it, maybe over >1000, that is formatted using convention that "." is the thousands >separator. >The number is not separated in SPARQL. >A long is 2^63 which is about 10^18 or one million million million (an >English trillion, or more usually one billion billion, or "exa"). The >LOD cloud is not an exatriple of RDF. I wouldn't like to guarantee that >ARQ copes that will above an int as it's not common but even a Java int >i>s 2 billion, and 2 billion triples and at 1K triples per call, is 2 >million calls to the LOD cloud copy. >http://en.wikipedia.org/wiki/Metric_prefix >>>> >>>> How can i handle this situation? I want to continue to query. >>>> >>>> Thanks. >>>> >>>> Ziya >> >>> If this error comes from DBpedia, then you'll have to ask them. >>> >>> Jena ARQ uses a long for the offset and limit - I suppose a BigInteger >>> might be necessary nowdays -- long ago, long was quite enough! >>> >>> By the way - why not download the dumps of the database instead? Much >>> more efficient. >> >> I am analyzing all datasets on LOD cloud to extract information about >> datasets. Dbpedia is only one of them. If i download dump files, i >> have to query them to extract information too. But i can delete >> analyzing triples from dump files and then it works. But i prefer >> querying at first. >It will take you a very long time to pull dbpedia over 1000 triples at a >time. There are approximately 226000000 triples on DBpedia. If i pull 10000 triples at a time and each execution takes average 20 seconds, it means 452000 seconds=5.2 days. It is enough to query DBpedia for me now. >> >>> Andy >> >> Ziya