Re: [Dbpedia-developers] optimization of Language specific links extraction

Dimitris Kontokostas Thu, 18 Jul 2013 01:44:15 -0700

Hi Hady,

You could re-use a lot of already defined utility functions for files &
triple parsing  but you are not so familiar with the framework yet so that
will come in time
See inline for your questions


On Thu, Jul 18, 2013 at 12:57 AM, Hady elsahar <[email protected]>wrote:

> Hello all ,
>
> Hoping that everyone is enjoying the summer ,
>
> I've written a scala 
> script<https://github.com/hadyelsahar/extraction-framework/blob/lang-link-extract/scripts/src/main/scala/org/dbpedia/extraction/scripts/LanguageSpecificLinksGenerator.scala>to
>  do the task to generate LLlinks specific files to be uploaded as
> mentioned by JC here 
> <http://www.mail-archive.com/[email protected]/msg00148.html>
>
> option 0 in the script is for extracting the master LL file
> option 1 is for extracting language specific links files
>
> the first iteration of the code is of complexity O(n^2) , where n is the
> lines in the master LL file ,it seems so Dumb and would take a lot of time
> when running it on the big dumb, there's a lot of easy ways to optimize
> this but i had some questions :
>
> 1- could we depend that the triples RDF dump will be in order ? ie.(for
> example all Q1000 entity triples will come after each other and we don't
> need to parse the rest of the file for related triples )
>

In general no. If you need them that way you can add a "sort" step in the
process pipeline


> 2- in that task which is better to optimize , memory vs time ?, loading
> file in a HashMap will optimize the speed a lot , but it may take some
> memory.
>

We'd prefer time but it always depends. A few extra GB of memory should be
acceptable but if you want to load a map with all WikiData entries that
will not scale well


> 3-just for the sake of curiosity and setting standards , the Language
> links extraction process in wikipedia , how much does it take in terms of
> time and do we dedicate special server for that ? or it doesn't need it's
> just a small process ?
>

It's a small task compared to the wikipedia extraction. In the scale of
only the language chapters it's around 15-30 minutes. But the initial ILL
dump is created with the extraction process so it's not directly comparable


Best,
Dimitris


> 4- any suggestions could be great
>
> thanks
> Regards
>
> -------------------------------------------------
> Hady El-Sahar
> Research Assistant
> Center of Informatics Sciences | Nile 
> University<http://nileuniversity.edu.eg/>
>
> email : [email protected]
> Phone : +2-01220887311
> http://hadyelsahar.me/
>
> <http://www.linkedin.com/in/hadyelsahar>
>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Dbpedia-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>
>


-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk

_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Re: [Dbpedia-developers] optimization of Language specific links extraction

Reply via email to