Re: [Dbpedia-discussion] dbpedia extraction framework on windows problems

Adrian Brasoveanu Mon, 08 Jul 2013 12:32:28 -0700

Hello Dimitris,

I am in vacation, so I don't really have a good Internet connection.
After reading your mail, I am succesfully running an extraction with the
download.minimal.properties file...(it is slow, but I was really curious).
However I did this under Ubuntu and it was really easy (following all the
steps from http://wiki.dbpedia.org/ExtractionOnUbuntu?v=fnb).
I did this because I wanted to see the difference between Windows and Linux
when running the bash scripts...


I also discovered I need a bigger virtual machine for such tasks :) (20 GB
is not enough :) ).

It took about 45 minutes to install everything under Linux (Intellij Idea,
apache, php, mysql, mediawiki, etc).
Under Windows it took some 3 hours or so, but mainly due to the fact that I
tried to do it with Eclipse..
After that I switched to Intellij Idea and it took another hour or so...

I noticed the next things:
0) Eclipse is really bad when it comes to maven and scala.... So I used
IntelliJ since that was recommended on Ubuntu also.
I used everything in last versions (IntelliJ Community last version, last
Maven, last Scala, and so on...)
1) The maven integration was not working out of the box in Windows even in
IntelliJ, compared to Linux (it took more than 30-40 minutes to bring out
all the
files and dependencies), but there were nice and clear error messages that
tell you what jar is missing)
2) There were 4-5 jars missings (they are easy to find on the web anyway,
apache.commons.compress was one of those jars, for example...
and a certain version of scala-test)
3) All the paths need to be changed (clearly you're not using
/home/release/downloads or something similar on Windows)
4) There is a need for a good tutorial on how to run bash files under
Windows....
5) Most of the errors you will get will be path errors,...

That would be it... If you want to run it on Windows. However I did not
added any settings for WAMP and MediaWiki on Windows...
Yes, I am willing to contribute to any Windows documentation or also to the
normal documentation.

As you probably guessed I am mostly interested in writing custom
extractors, that's where all the attraction of this framework lies,
not to mention that it's a good excuse to try Scala :).

Best regards,
Adrian



On Fri, Jul 5, 2013 at 6:31 PM, Dimitris Kontokostas <[email protected]>wrote:

> Hi Andrian,
>
> The project runs with maven3 and the dump pom defines various launchers
> for simple extraction we use the download & extraction launchers
>
>
> https://github.com/dbpedia/extraction-framework/wiki/Extraction-Instructions
> the "run" script runs in bash but if you take a look [1] you will quickly
> find out what it needs to run the scripts in windows.
> You can equivalently make run configurations in IntellJ / eclipse
>
> Once you get it running, could take some time and create a guide for
> windows users?
>
> Cheers,
> Dimitris
>
> [1] https://github.com/dbpedia/extraction-framework/blob/master/run
>
>
>  On Thu, Jul 4, 2013 at 11:55 AM, Adrian Brasoveanu <
> [email protected]> wrote:
>
>> Hello all,
>>
>> Sorry for re-posting this. First time I got an error message because I
>> was not subscribed to this list.
>>
>> I tried running the DBPedia Extraction Framework on Windows.
>> I used these settings in the pom.xml:
>>
>>                          <launcher>
>>                             <id>import</id>
>>
>> <mainClass>org.dbpedia.extraction.dump.sql.Import</mainClass>
>>                             <jvmArgs>
>>                                 <jvmArg>-server</jvmArg>
>>                             </jvmArgs>
>>                             <args>
>>                                 <!-- base folder of downloaded dumps -->
>>                                 <arg>/home/release/wikipedia</arg>
>>                                 <!-- location of SQL file containing
>> MediaWiki table definitions  -->
>>
>> <arg>/home/release/data/projects/mediawiki/core/maintenance/tables.sql</arg>
>>                                 <!-- JDBC URL of MySQL server. Import
>> creates a new database for each wiki. -->
>>
>> <arg>jdbc:mysql://localhost/?characterEncoding=UTF-8</arg>
>>                                 <!-- require-download-complete -->
>>                                 <arg>true</arg>
>>                                 <!-- file name:
>> pages-articles.xml{,.bz2,.gz} -->
>>                                 <arg>pages-articles.xml.bz2</arg>
>>                                 <!-- languages and article count ranges,
>> comma-separated, e.g. "de,en" or "@mappings" etc. -->
>>                                 <arg>en</arg>
>>                             </args>
>>                         </launcher>
>>
>> The error I got was this:
>> The error that I get is this:
>>
>> [INFO] launcher 'import' selected =>
>> org.dbpedia.extraction.dump.sql.Import
>> java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>  at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>  at
>> org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
>> at
>> org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
>> Caused by: java.io.FileNotFoundException:
>> \home\release\data\projects\mediawiki\core\maintenance\tables.sql (The
>> system cannot find the path specified)
>> at java.io.FileInputStream.open(Native Method)
>>  at java.io.FileInputStream.<init>(FileInputStream.java:138)
>> at scala.io.Source$.fromFile(Source.scala:91)
>>  at scala.io.Source$.fromFile(Source.scala:76)
>> at org.dbpedia.extraction.dump.sql.Import$.main(Import.scala:32)
>> at org.dbpedia.extraction.dump.sql.Import.main(Import.scala)
>>  ... 6 more
>>
>> So it appears that I need to have mediawiki even though I don't want to
>> extract the abstracts...
>>
>> My questions are this:
>> 1) assuming that I do not want to generate the abstracts yet (I just want
>> to see how it works and how to create custom dumps),
>> do I still need a copy of the next things:
>> local MediaWiki and Wikipedia (http://wiki.dbpedia.org/Documentation;
>> https://github.com/dbpedia/extraction-framework/wiki/Extraction-Instructions
>>  and
>> http://wiki.dbpedia.org/Documentation/ExtractionConfiguration?v=17gm -
>> do not mention that I need a MediaWiki - and Wikipedia mirror except if I
>> want
>> to extract abstracts);
>>
>> 2) Does this process works on Windows? Do I still need to provide old
>> dumps in order to run this framework?
>>
>> 3) Where can I setup the default configuration file that I will use?
>> There is no default configuration specified in the pom file...  so that
>> when I run the scala plugin it will automatically use that config file...
>>
>>
>> Best regards,
>> Adrian
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Windows:
>>
>> Build for Windows Store.
>>
>> http://p.sf.net/sfu/windows-dev2dev
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>>
>
>
> --
> Kontokostas Dimitris
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev

_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] dbpedia extraction framework on windows problems

Reply via email to