Re: [Dbpedia-developers] out of dynamic memory in Raptor

Hady elsahar Thu, 15 Aug 2013 12:01:08 -0700

Hello JC,

i've checked on the internet for some ntriples/turtle parsers
and i've found this useful topic :
http://answers.semanticweb.com/questions/14084/fast-tool-to-convert-ttl-to-ntriples


actually i went for Serdi <http://drobilla.net/software/serd/> and it seems
that it doesn't need too much memory utilization like Raptor ,
i'm giving it a try on the server now. for a last ( 1 command, Bugfree )
trial .

if it went into problems , i'd consider one of the other alternatives .

Me and Sebastian today had a meeting discussing such issue ( non reliable
Ntriples WikiData dump ready yet ) as well as the need for output Dumps. so
it might be faster and more to the point to parse the Xml Json format
ourselves and using already written code in the extraction framework like
the JSON parser that editing in Markus code.

let's see the output of the Serdi library (it seems promising) and i'll
write a descriptive email for that.


thanks
Regards





On Thu, Aug 15, 2013 at 8:19 PM, Jona Christopher Sahnwaldt <[email protected]
> wrote:

> On 15 August 2013 19:35, Hady elsahar <[email protected]> wrote:
> > Hello All,
> >
> > i'm using Raptor 2.0.6  for converting files from Turtle format to
> Ntriples
> > format
> >
> > the command is :
> > rapper -i turtle turtle-20130811-links.ttl > turtle-20130811-links.nt
> >
> >
> > it gives me error :
> > out of dynamic memory in turtle_lexer__scan_bytes()
> >
> > when i searched for the issue tracker for raptor i found it's a known
> issue
> > http://bugs.librdf.org/mantis/view.php?id=512
> >
> > long story short :
> >>
> >> a) the entire input has to be in one buffer in memory
> >> b) it uses 32 bit offsets and reserves the top bit for something
> internal
> >
> >
> > is there a way i can overcome this using raptor ?
> > if not is there an alternative that i can use instead for large files.
> >
> > considering i'm working on a ~9GB turtle dump on 6GB RAM machine
>
> It depends on what you want to do. How many triples do you really need
> to keep in memory? Most Scala scripts I wrote for the DBpedia 3.8 and
> 3.9 releases just pipe and filter data, but some need to keep dozens
> of millions of links in memory. Because the URIs are all very similar
> (http://de.dbpedia.org/resource/Berlin,
> http://fr.dbpedia.org/resource/Berlin, etc.) , it's possible to take
> them apart, store only the parts that differ ("de", "fr", "Berlin")
> and reconstruct them only for serialisation. For example, the script
> [1] that processes the Wikidata link dump that Daniel Kinzler gave us
> works with 4GB RAM.
>
> I bet there are many other data structures to store millions of
> similar strings efficiently.
>
> Maybe you can also change Markus' Python scripts to produce the syntax you
> want.
>
> Regards,
> JC
>
> [1]
> https://github.com/dbpedia/extraction-framework/blob/dump/scripts/src/main/scala/org/dbpedia/extraction/scripts/ProcessWikidataLinks.scala
>
> >
> > thanks
> > Regards
> >
> > -------------------------------------------------
> > Hady El-Sahar
> > Research Assistant
> > Center of Informatics Sciences | Nile University
> >
> >
>



-- 
-------------------------------------------------
Hady El-Sahar
Research Assistant
Center of Informatics Sciences | Nile University<http://nileuniversity.edu.eg/>

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk

_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Re: [Dbpedia-developers] out of dynamic memory in Raptor

Reply via email to