On 15 August 2013 19:35, Hady elsahar <[email protected]> wrote:
> Hello All,
>
> i'm using Raptor 2.0.6  for converting files from Turtle format to Ntriples
> format
>
> the command is :
> rapper -i turtle turtle-20130811-links.ttl > turtle-20130811-links.nt
>
>
> it gives me error :
> out of dynamic memory in turtle_lexer__scan_bytes()
>
> when i searched for the issue tracker for raptor i found it's a known issue
> http://bugs.librdf.org/mantis/view.php?id=512
>
> long story short :
>>
>> a) the entire input has to be in one buffer in memory
>> b) it uses 32 bit offsets and reserves the top bit for something internal
>
>
> is there a way i can overcome this using raptor ?
> if not is there an alternative that i can use instead for large files.
>
> considering i'm working on a ~9GB turtle dump on 6GB RAM machine

It depends on what you want to do. How many triples do you really need
to keep in memory? Most Scala scripts I wrote for the DBpedia 3.8 and
3.9 releases just pipe and filter data, but some need to keep dozens
of millions of links in memory. Because the URIs are all very similar
(http://de.dbpedia.org/resource/Berlin,
http://fr.dbpedia.org/resource/Berlin, etc.) , it's possible to take
them apart, store only the parts that differ ("de", "fr", "Berlin")
and reconstruct them only for serialisation. For example, the script
[1] that processes the Wikidata link dump that Daniel Kinzler gave us
works with 4GB RAM.

I bet there are many other data structures to store millions of
similar strings efficiently.

Maybe you can also change Markus' Python scripts to produce the syntax you want.

Regards,
JC

[1] 
https://github.com/dbpedia/extraction-framework/blob/dump/scripts/src/main/scala/org/dbpedia/extraction/scripts/ProcessWikidataLinks.scala

>
> thanks
> Regards
>
> -------------------------------------------------
> Hady El-Sahar
> Research Assistant
> Center of Informatics Sciences | Nile University
>
>

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to