Hi, Dimitris is right. Ahmed was referring to Import.scala, but that's probably not what's causing the problem.
Ahmed, please try to edit the config file as Dimitris said and the extraction should work. You only need Import.scala if you want to extract abstracts. Anyway, I just added some code to make Import.scala more flexible. I also added a new argument in dump/pom.xml: users can now specify the name of the XML dump file, and Import.scala will automatically unzip if the suffix is .gz or .bz2. If you encouter any problems, let us know. Cheers, JC On 21 April 2013 18:08, Jona Christopher Sahnwaldt <[email protected]> wrote: > Hi, > > hm, no, sorry, in this case that won't work. The Import class is not > configurable enough. I think Import.scala can't handle zipped files at > all, so changing the name won't help either. I'll have a look, maybe I > can fix this quickly. > > Cheers, > JC > > On 21 April 2013 18:00, Dimitris Kontokostas <[email protected]> wrote: >> Hi Ahmed, >> >> in the default configuration files you will find the following lines >> # default: >> # source=pages-articles.xml >> >> # alternatives: >> # source=pages-articles.xml.bz2 >> # source=pages-articles.xml.gz >> >> You should comment / uncomments the ones that suit you >> >> Best, >> Dimitris >> >> >> >> On Sun, Apr 21, 2013 at 2:24 AM, Ahmed Ktob <[email protected]> wrote: >>> >>> Hello guys, >>> >>> Today I was trying to use the extraction framework to extract data for the >>> Arabic language. When it comes to finding the file in the download directory >>> (dump file), it didn't work, so after a while I figured that a part of code >>> from the file Import.scala is written as follow : >>> >>> try { >>> for (language <- languages) { >>> >>> val finder = new Finder[File](baseDir, language, "wiki") >>> val tagFile = if (requireComplete) Download.Complete else >>> "pages-articles.xml" >>> val date = finder.dates(tagFile).last >>> val file = finder.file(date, "pages-articles.xml") >>> >>> I tried to change the name to "pages-articales.xml.bz2" and the extraction >>> successfully passed this point. >>> >>> My point is, don't you think that we should make the changes I mentioned >>> above ? Because when we download the dump file, it comes with ".bz2" in the >>> name. >>> >>> Best regards, >>> Ahmed. >>> -- >>> ------------------------------------------------ >>> Ahmed Ktob >>> Dr. Taher Moulay University >>> Department of Computer Science >>> Saida , Algeria >>> Tel : +213 554 811 151 >>> ------------------------------------------------ >>> >>> >>> ------------------------------------------------------------------------------ >>> Precog is a next-generation analytics platform capable of advanced >>> analytics on semi-structured data. The platform includes APIs for building >>> apps and a phenomenal toolset for data science. Developers can use >>> our toolset for easy data analysis & visualization. Get a free account! >>> http://www2.precog.com/precogplatform/slashdotnewsletter >>> _______________________________________________ >>> Dbpedia-discussion mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >>> >> >> >> >> -- >> Kontokostas Dimitris >> >> ------------------------------------------------------------------------------ >> Precog is a next-generation analytics platform capable of advanced >> analytics on semi-structured data. The platform includes APIs for building >> apps and a phenomenal toolset for data science. Developers can use >> our toolset for easy data analysis & visualization. Get a free account! >> http://www2.precog.com/precogplatform/slashdotnewsletter >> _______________________________________________ >> Dbpedia-discussion mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
