Re: [Dbpedia-gsoc] GSOC newbie

Amitrajit Sarkar Sun, 15 Mar 2015 13:10:58 -0700

hi Gabriel,

thanks for the instructions. this may probably sound noobish, but Im stuck
with the Wikidump extraction.

after cloning, I ran '$mvn clean install': all went well. I modified the
download.minimal.properties with
'base-dir=/home/thenet/Code/DBpedia/downloads/' and
'download=la:pages-articles.xml.bz2' (why the Latin dump? because it was
small in size, which is usually a good thing for the first test): '$ ../run
download config=download.minimal.properties' ran fine.

I changed the extraction.abstracts.properties to include
'base-dir=/home/thenet/Code/DBpedia/downloads/' and
'source=lawiki-20150310-pages-articles.xml.bz2' (the bz2 exists in the
base-dir/lawiki/20150310/ directory). on running '$../run extraction
extraction.abstracts.properties' I get:
'parsing /home/thenet/Code/DBpedia/downloads/wikipedias.csv
java.lang.reflect.InvocationTargetException
...
at
org.dbpedia.extraction.util.ConfigUtils$.parseLanguages(ConfigUtils.scala:83)
...'.

I guessed it had something to do with the languages parameter, so I tried
changing 'languages=10000-' to 'languages=la' in the extraction properties
file. wasnt much better. while trying the same with
extraction.default.properties I noticed that there is no 'extractor.la=...'
parameter, although going by this
<http://wiki.dbpedia.org/DeveloperDocumentation/Extractor> there are a few
extractors applicable for all languages. Im probably messing up a
parameter. I noticed that this question
<http://stackoverflow.com/questions/28318185/dbpedia-extraction-framewrok-failure-during-extraction-of-dbpedia-dump>
exists on StackOverflow, but its not been answered. Id like to fix it.

thank you.
Amitrajit.

On Sun, Mar 15, 2015 at 10:13 PM, Gabriel Fair <[email protected]>
wrote:

> Hi Amitrajit,
>
> I have found that the best place to start is by running a dbpedia server
> on your local machine. I followed these instructions
> <http://wiki.dbpedia.org/Documentation> to checkout the text extractor
> and run a dpedia server on my machine. By using Maven all the things you
> would have to download are done for you. Let me know if you have any
> questions. I asked a simular question
> <http://stackoverflow.com/q/29055508/635160>on Stack Overflow, if that
> helps you.
>
> Thanks,
>
> On Fri, Mar 13, 2015 at 2:05 PM, Amitrajit Sarkar <[email protected]>
> wrote:
>
>> hi..
>>
>> my name is Amitrajit. I am a CS undergraduate student from Jadavpur
>> University, India. this is my first time applying for Google Summer of
>> Code. the ideas: 'fact extraction from Wikipedia text' and 'reverse
>> engineering and aligning Freebase with DBpedia' caught my attention. (I
>> hope) I understand what the topics mean (as I took an online course from
>> Stanford on Natural Language Understand once, which outlined a few of the
>> concepts) but was looking for some guidance on how to get started exploring
>> the framework before I write out my application. I was building from source
>> when I realized that the Wikimedia dump files will take a while to download
>> (or perhaps Im looking at the wrong files). is there anything else I could
>> try first? perhaps write a wrapper-parser to extract data from a single
>> Wikipedia page, or something similar to get warmed up to what everyone at
>> DBpedia does..
>>
>> any help would be greatly appreciated. thank you..
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website,
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub
>> for all
>> things parallel software development, from weekly thought leadership
>> blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Dbpedia-gsoc mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>
>>
>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Re: [Dbpedia-gsoc] GSOC newbie

Reply via email to