Re: [Dbpedia-discussion] .bz2 problem

Ahmed Ktob Sun, 21 Apr 2013 10:48:04 -0700

Well, first I should mention that I am using Intellij IDEA within Windows
7, I can't try now on Linux because my works on Windows and I haven't
enough free space ))


Also I am following this tutorial [1] to accomplish the Abstract Extraction.
I followed it until when it comes to importing data, it didn't work for me
with the error :

java.lang.IllegalArgumentException: found no directory
C:\Users\AHMED\Desktop\arwiki/[YYYYMMDD] containing file
arwiki-[YYYYMMDD]-pages-articles.xml

So I started reading the Import.Scala code and I figured maybe if I changed
the code :

val tagFile = if (requireComplete) Download.Complete else *
"pages-articles.xml"*
val date = finder.dates(tagFile).last
val file = finder.file(date, *"pages-articles.xml"*)

to  *"pages-articles.xml.bz2" *maybe it will work.* *I did it and it worked
(I passed this step).

After the answer of Dimitris, I redo my changes and uncomment the source as
he mentioned in both *extraction.abstracts.properties* &  *
extraction.default.properties *but I couldn't pass this step (the same
error above).

I am using Maven 3.0.4, and to start Maven I just followed the guide :
clean -> install (on Parent Pom of the DBPedia framework)
Scala:run (on DBpedia Dump Extraction)

Currently, I want just the default extraction not the abstract, but I can't
find a guide. Any suggestion ?

Thank you so much.

Cheers,
Ahmed.

[1]
https://github.com/dbpedia/extraction-framework/wiki/Dbpedia-Abstract-Extraction-step-by-step-guide


On 21 April 2013 18:19, Jona Christopher Sahnwaldt <[email protected]> wrote:

> Ahmed,
>
> if things still don't work for you, please tell us exactly what you
> are trying to do: which Maven launcher? How do you start it? Please
> attach a copy of the configuration files and Scala files that you
> edited and a text file containing the complete Maven output.
>
> Cheers,
> JC
>
> On 21 April 2013 19:17, Jona Christopher Sahnwaldt <[email protected]>
> wrote:
> > Hi,
> >
> > Dimitris is right. Ahmed was referring to Import.scala, but that's
> > probably not what's causing the problem.
> >
> > Ahmed, please try to edit the config file as Dimitris said and the
> > extraction should work. You only need Import.scala if you want to
> > extract abstracts.
> >
> > Anyway, I just added some code to make Import.scala more flexible. I
> > also added a new argument in dump/pom.xml: users can now specify the
> > name of the XML dump file, and Import.scala will automatically unzip
> > if the suffix is .gz or .bz2.
> >
> > If you encouter any problems, let us know.
> >
> > Cheers,
> > JC
> >
> > On 21 April 2013 18:08, Jona Christopher Sahnwaldt <[email protected]>
> wrote:
> >> Hi,
> >>
> >> hm, no, sorry, in this case that won't work. The Import class is not
> >> configurable enough. I think Import.scala can't handle zipped files at
> >> all, so changing the name won't help either. I'll have a look, maybe I
> >> can fix this quickly.
> >>
> >> Cheers,
> >> JC
> >>
> >> On 21 April 2013 18:00, Dimitris Kontokostas <[email protected]> wrote:
> >>> Hi Ahmed,
> >>>
> >>> in the default configuration files you will find the following lines
> >>> # default:
> >>> # source=pages-articles.xml
> >>>
> >>> # alternatives:
> >>> # source=pages-articles.xml.bz2
> >>> # source=pages-articles.xml.gz
> >>>
> >>> You should comment / uncomments the ones that suit you
> >>>
> >>> Best,
> >>> Dimitris
> >>>
> >>>
> >>>
> >>> On Sun, Apr 21, 2013 at 2:24 AM, Ahmed Ktob <[email protected]> wrote:
> >>>>
> >>>> Hello guys,
> >>>>
> >>>> Today I was trying to use the extraction framework to extract data
> for the
> >>>> Arabic language. When it comes to finding the file in the download
> directory
> >>>> (dump file), it didn't work, so after a while I figured that a part
> of code
> >>>> from the file Import.scala is written as follow :
> >>>>
> >>>> try {
> >>>> for (language <- languages) {
> >>>>
> >>>> val finder = new Finder[File](baseDir, language, "wiki")
> >>>> val tagFile = if (requireComplete) Download.Complete else
> >>>> "pages-articles.xml"
> >>>> val date = finder.dates(tagFile).last
> >>>>   val file = finder.file(date, "pages-articles.xml")
> >>>>
> >>>> I tried to change the name to "pages-articales.xml.bz2" and the
> extraction
> >>>> successfully passed this point.
> >>>>
> >>>> My point is, don't you think that we should make the changes I
> mentioned
> >>>> above ? Because when we download the dump file, it comes with ".bz2"
> in the
> >>>> name.
> >>>>
> >>>> Best regards,
> >>>> Ahmed.
> >>>> --
> >>>> ------------------------------------------------
> >>>> Ahmed Ktob
> >>>> Dr. Taher Moulay University
> >>>> Department of Computer Science
> >>>> Saida , Algeria
> >>>> Tel : +213 554 811 151
> >>>> ------------------------------------------------
> >>>>
> >>>>
> >>>>
> ------------------------------------------------------------------------------
> >>>> Precog is a next-generation analytics platform capable of advanced
> >>>> analytics on semi-structured data. The platform includes APIs for
> building
> >>>> apps and a phenomenal toolset for data science. Developers can use
> >>>> our toolset for easy data analysis & visualization. Get a free
> account!
> >>>> http://www2.precog.com/precogplatform/slashdotnewsletter
> >>>> _______________________________________________
> >>>> Dbpedia-discussion mailing list
> >>>> [email protected]
> >>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Kontokostas Dimitris
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Precog is a next-generation analytics platform capable of advanced
> >>> analytics on semi-structured data. The platform includes APIs for
> building
> >>> apps and a phenomenal toolset for data science. Developers can use
> >>> our toolset for easy data analysis & visualization. Get a free account!
> >>> http://www2.precog.com/precogplatform/slashdotnewsletter
> >>> _______________________________________________
> >>> Dbpedia-discussion mailing list
> >>> [email protected]
> >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> >>>
>



-- 
*------------------------------------------------
**Ahmed Ktob
Dr. Taher Moulay* *University * *
Department of Computer Science*
*Saida , Algeria*
*Tel : +213 554 811 151**
------------------------------------------------*

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter

_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] .bz2 problem

Reply via email to