Re: [Dbpedia-developers] Running Lang links extraction Scripts on Wikidata wda extracts [updates]

Hady elsahar Thu, 15 Aug 2013 02:38:12 -0700

Hello All ,

it seems that we have some issues considering the wda-extract python code .
i'm settling this with Markus on this issue on
GitHub<https://github.com/mkroetzsch/wda/issues/1#issuecomment-22661174>
.
but it usually takes some idle time in between communications.


so i wonder if there's any comments on the past email , considering the LL
Scala code .
or we could start in the next step ( maybe for the properties mappings )


thanks
Regards


On Mon, Aug 12, 2013 at 6:50 PM, Hady elsahar <[email protected]> wrote:

> Hello All,
>
> because there's still some bugs in the wda script for Wikidata extracts ,
> Sebastian Asked me to run the LL extraction scripts on the server for the
> part of the data that doesn't contain bugs yet they are about 7M triples.
>
> the process is as follows:
>
>    1. Running the wda script and using the option 'turtle-links'
>    2. unzipping the extracts and convert it to Nturtle format using *
>    rapper*
>       - rapper -i turtle turtle-20130808-links.ttl
>    3. Generating Master LLfiles using command
>       - sudo mvn scala:run -Dlauncher=GenerateLLMasterFile
>    4. Generate specific Language links files :
>       -  sudo mvn scala:run -Dlauncher=GenerateLLSpecificFiles
>
> ps: in steps 3 and 4 update the arguments of each script (the location of
> input / output dumps ) in the pom.xml file inside the scripts folder
>
> *what's done so far : *
>
>    1. 7 millions triples that passed from rapper phase without
>    encountering a bug
>    2. Running Master LL files extraction   ( the output in
>     /root/hady_wikidata_extraction/Datasets/languagelinks/MasterLLfile.nt )
>    3. Running Specific LL files extraction ( the output now is in
>    /root/hady_wikidata_extraction/Datasets/languagelinks/LLfiles/ )
>    4. Running the new version of wda python script for getting larger
>    initial dump  more than 7M triples
>
>
>
>
>
> *Benchmarking ( for the 7 Million triples on the lgd server):*
>
>    1. Generating Master LLfile  : 28 secs
>    2. Generating Sepcific Files : 3 Minutes ,10 seconds
>
>
> however 7M triples couldn't be so expressive , but i see with the previous
> benchmark that it would be able to scale because we have linear complexity
> for both
> Steps 3 and 4 mentioned above
>
> by doing the math , assuming that each entity contains all languages
> (around 120 language) and that's very extremist i guess . the whole process
> would take some tens of hours.
>
>
> thanks
> Regards
>
> -------------------------------------------------
> Hady El-Sahar
> Research Assistant
> Center of Informatics Sciences | Nile 
> University<http://nileuniversity.edu.eg/>
>
>
>


-- 
-------------------------------------------------
Hady El-Sahar
Research Assistant
Center of Informatics Sciences | Nile University<http://nileuniversity.edu.eg/>

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk

_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Re: [Dbpedia-developers] Running Lang links extraction Scripts on Wikidata wda extracts [updates]

Reply via email to