On 15 August 2013 19:35, Hady elsahar <[email protected]> wrote: > Hello All, > > i'm using Raptor 2.0.6 for converting files from Turtle format to Ntriples > format > > the command is : > rapper -i turtle turtle-20130811-links.ttl > turtle-20130811-links.nt > > > it gives me error : > out of dynamic memory in turtle_lexer__scan_bytes() > > when i searched for the issue tracker for raptor i found it's a known issue > http://bugs.librdf.org/mantis/view.php?id=512 > > long story short : >> >> a) the entire input has to be in one buffer in memory >> b) it uses 32 bit offsets and reserves the top bit for something internal > > > is there a way i can overcome this using raptor ? > if not is there an alternative that i can use instead for large files. > > considering i'm working on a ~9GB turtle dump on 6GB RAM machine
It depends on what you want to do. How many triples do you really need to keep in memory? Most Scala scripts I wrote for the DBpedia 3.8 and 3.9 releases just pipe and filter data, but some need to keep dozens of millions of links in memory. Because the URIs are all very similar (http://de.dbpedia.org/resource/Berlin, http://fr.dbpedia.org/resource/Berlin, etc.) , it's possible to take them apart, store only the parts that differ ("de", "fr", "Berlin") and reconstruct them only for serialisation. For example, the script [1] that processes the Wikidata link dump that Daniel Kinzler gave us works with 4GB RAM. I bet there are many other data structures to store millions of similar strings efficiently. Maybe you can also change Markus' Python scripts to produce the syntax you want. Regards, JC [1] https://github.com/dbpedia/extraction-framework/blob/dump/scripts/src/main/scala/org/dbpedia/extraction/scripts/ProcessWikidataLinks.scala > > thanks > Regards > > ------------------------------------------------- > Hady El-Sahar > Research Assistant > Center of Informatics Sciences | Nile University > > ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ Dbpedia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
