Hello All,

because there's still some bugs in the wda script for Wikidata extracts ,
Sebastian Asked me to run the LL extraction scripts on the server for the
part of the data that doesn't contain bugs yet they are about 7M triples.

the process is as follows:

   1. Running the wda script and using the option 'turtle-links'
   2. unzipping the extracts and convert it to Nturtle format using *rapper*

      - rapper -i turtle turtle-20130808-links.ttl
   3. Generating Master LLfiles using command
      - sudo mvn scala:run -Dlauncher=GenerateLLMasterFile
   4. Generate specific Language links files :
      -  sudo mvn scala:run -Dlauncher=GenerateLLSpecificFiles

ps: in steps 3 and 4 update the arguments of each script (the location of
input / output dumps ) in the pom.xml file inside the scripts folder

*what's done so far : *

   1. 7 millions triples that passed from rapper phase without encountering
   a bug
   2. Running Master LL files extraction   ( the output in
    /root/hady_wikidata_extraction/Datasets/languagelinks/MasterLLfile.nt )
   3. Running Specific LL files extraction ( the output now is in
   /root/hady_wikidata_extraction/Datasets/languagelinks/LLfiles/ )
   4. Running the new version of wda python script for getting larger
   initial dump  more than 7M triples





*Benchmarking ( for the 7 Million triples on the lgd server):*

   1. Generating Master LLfile  : 28 secs
   2. Generating Sepcific Files : 3 Minutes ,10 seconds


however 7M triples couldn't be so expressive , but i see with the previous
benchmark that it would be able to scale because we have linear complexity
for both
Steps 3 and 4 mentioned above

by doing the math , assuming that each entity contains all languages
(around 120 language) and that's very extremist i guess . the whole process
would take some tens of hours.


thanks
Regards

-------------------------------------------------
Hady El-Sahar
Research Assistant
Center of Informatics Sciences | Nile University<http://nileuniversity.edu.eg/>
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to