On 02/05/12 07:28, Ziya Akar wrote:
>>> Hi,
>>>>
>>>> I am trying to retrieve all dbpedia triples to extract information
>>>> about dataset by query below :
>>>>
>>>> Select * where  {?s ?p ?o} LIMIT 1000 OFFSET x.
>>>>
>>>> I increment 1000 value of x in each execution. After a while value of
>>>> x variable exceeds decimal bounds.
>>
>>> What is the error message exactly? (from which system?)
>>
>> Error message is : com.hp.hpl.jena.query.QueryParseException:
>> Encountered "<DECIMAL>. It means that Long is not enough for retrieve
>> all DBpedia results.

>That's not a complete error.  It prints out the offending token.

>Please - a complete, minimal example

>Complete - all the details (what's the query being parsed?)
>Minimal - what the least information needed (smallest offset)

>It also good practice to include version information.

>A DECIMAL is the token for a number with a DOT in it.


>When I tried:

>SELECT * { ?s ?p ?o } LIMIT 1000 OFFSET 100000000000


>When I tried:

>SELECT * { ?s ?p ?o } LIMIT 1000 OFFSET 1000000000000000000000000000

>using arq.qparse, I got:

>08:49:07 WARN  ParserSPARQL11       :: Unexpected throwable:
>java.lang.NumberFormatException: For input string:
>"1000000000000000000000000000"

>but if I try:

>qparse 'SELECT * { ?s ?p ?o } LIMIT 1000 OFFSET 10.5'

>I get

>Encountered " <DECIMAL> "10.5 "" at line 1, column 41.
>Was expecting:
>     <INTEGER> ...

The reason of my exception is setting decimal number i think. Because
i use a decimal variable to set OFFSET. Now it worked. It was my
mistake.

>May I suggest that you have put in a number with a "." in it, maybe over
>1000, that is formatted using convention that "." is the thousands
>separator.

>The number is not separated in SPARQL.



>A long is 2^63 which is about 10^18 or one million million million (an
>English trillion, or more usually one billion billion, or "exa").  The
>LOD cloud is not an exatriple of RDF.  I wouldn't like to guarantee that
>ARQ copes that will above an int as it's not common but even a Java int
>i>s 2 billion, and 2 billion triples and at 1K triples per call, is 2
>million calls to the LOD cloud copy.


>http://en.wikipedia.org/wiki/Metric_prefix

>>>>
>>>> How can i handle this situation?  I want to continue to query.
>>>>
>>>> Thanks.
>>>>
>>>> Ziya
>>
>>> If this error comes from DBpedia, then you'll have to ask them.
>>>
>>> Jena ARQ uses a long for the offset and limit - I suppose a BigInteger
>>> might be necessary nowdays -- long ago, long was quite enough!
>>>
>>> By the way - why not download the dumps of the database instead?  Much
>>> more efficient.
>>
>> I am analyzing all datasets on LOD cloud to extract information about
>> datasets. Dbpedia is only one of them. If i download dump files, i
>> have to query them to extract information too. But i can delete
>> analyzing triples from dump files and then it works. But i prefer
>> querying at first.

>It will take you a very long time to pull dbpedia over 1000 triples at a
>time.

There are approximately 226000000 triples on DBpedia. If i pull 10000
triples at a time and each execution takes average 20 seconds,  it
means 452000 seconds=5.2 days. It is enough to query DBpedia  for me
now.

>>
>>>     Andy
>>
>> Ziya

Reply via email to