Hi Hady,
You could re-use a lot of already defined utility functions for files &
triple parsing but you are not so familiar with the framework yet so that
will come in time
See inline for your questions
On Thu, Jul 18, 2013 at 12:57 AM, Hady elsahar <[email protected]>wrote:
> Hello all ,
>
> Hoping that everyone is enjoying the summer ,
>
> I've written a scala
> script<https://github.com/hadyelsahar/extraction-framework/blob/lang-link-extract/scripts/src/main/scala/org/dbpedia/extraction/scripts/LanguageSpecificLinksGenerator.scala>to
> do the task to generate LLlinks specific files to be uploaded as
> mentioned by JC here
> <http://www.mail-archive.com/[email protected]/msg00148.html>
>
> option 0 in the script is for extracting the master LL file
> option 1 is for extracting language specific links files
>
> the first iteration of the code is of complexity O(n^2) , where n is the
> lines in the master LL file ,it seems so Dumb and would take a lot of time
> when running it on the big dumb, there's a lot of easy ways to optimize
> this but i had some questions :
>
> 1- could we depend that the triples RDF dump will be in order ? ie.(for
> example all Q1000 entity triples will come after each other and we don't
> need to parse the rest of the file for related triples )
>
In general no. If you need them that way you can add a "sort" step in the
process pipeline
> 2- in that task which is better to optimize , memory vs time ?, loading
> file in a HashMap will optimize the speed a lot , but it may take some
> memory.
>
We'd prefer time but it always depends. A few extra GB of memory should be
acceptable but if you want to load a map with all WikiData entries that
will not scale well
> 3-just for the sake of curiosity and setting standards , the Language
> links extraction process in wikipedia , how much does it take in terms of
> time and do we dedicate special server for that ? or it doesn't need it's
> just a small process ?
>
It's a small task compared to the wikipedia extraction. In the scale of
only the language chapters it's around 15-30 minutes. But the initial ILL
dump is created with the extraction process so it's not directly comparable
Best,
Dimitris
> 4- any suggestions could be great
>
> thanks
> Regards
>
> -------------------------------------------------
> Hady El-Sahar
> Research Assistant
> Center of Informatics Sciences | Nile
> University<http://nileuniversity.edu.eg/>
>
> email : [email protected]
> Phone : +2-01220887311
> http://hadyelsahar.me/
>
> <http://www.linkedin.com/in/hadyelsahar>
>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Dbpedia-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>
>
--
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers