Re: [Dbpedia-gsoc] GSoC : Crowdsource tests and extraction rules

Jona Christopher Sahnwaldt Sat, 27 Apr 2013 23:25:44 -0700

Hi Rahul,

could you attach the properties files you used for download and extraction
and also the error log? Thanks.


JC
On Apr 27, 2013 11:35 AM, "Rahul Sharnagat" <[email protected]> wrote:

> Sorry, gain forgot to add gsoc mailing list :)
>
>
> On Sat, Apr 27, 2013 at 1:53 PM, Rahul Sharnagat <[email protected]>wrote:
>
>> Hi Christopher, Dimitris
>>      Thanks Dimitris for the proxy help. Dump download went easily. But
>> while running the extraction with extraction.default.properties. there was
>> error for missing wikipedias.csv. I went through the mail archieve and
>> found this solution 
>> here<http://www.mail-archive.com/[email protected]/msg03921.html>.
>> Changed languages to en instead of 10000- . New error has occured for which
>> i have attached the logs. it is trying to find arwiki when there is none in
>> base-dir.
>>
>> *Regarding proposal*
>>
>>      I have started writing my proposal and am facing some problems
>> relating to proposing a tentative solution. Since there are three objective
>> for this idea . I am providing what abstract thought i have for each problem
>>
>>    - Extending the DBpedia mapping wiki so that the editors could
>>    provide the rules for the data format that need to extracted
>>
>> I looked into the code of dataparser . Currently, let say for month
>> information, config file extraction.config.dataparser  defines how months
>> in each language need to be parsed .So the solution to the problem would be
>> to define a module that stores and access and specify the rules for this
>> info instead of writing scala code. Is this what is expected from this task
>> ? As christopher mentioned in issue #36 bulding a DSL could be solution.
>>
>>    - Moving data types from extraction code to the mapping wiki ontology
>>
>> Seems easy to do. I just didn't find the code that modifies wiki. Or how
>> mapping wiki extracts info from this code . (not very clear on this, need 
>> pointers
>> in code)
>>
>>    - Extend mapping wiki to include tests could be specified on wiki
>>    page so that the community can contribute
>>
>> Do i need to elaborate extensively on what kind of test should i be
>> implementing. It would be very helpful if you can elaborate on testing. I
>> understand that we need a module that take input from mapping wiki users
>> for a particular language and a way that defines these tests and validate
>> the extraction result. Can give me some pointer regarding implementation of
>> test cases?
>>
>> Thanks
>>
>>
>>
>> On Thu, Apr 25, 2013 at 12:19 PM, Dimitris Kontokostas <[email protected]
>> > wrote:
>>
>>> Hi Rahul,
>>>
>>> You should put your main effort in your application but I think this
>>> task will also help you get a better idea on what to expect.
>>>
>>> Regarding the proxy, we have the following launcher in dump/pom.xml,
>>> please uncomment and adappt the proxy settings
>>>                         <launcher>
>>>                             <id>download</id>
>>>
>>> <mainClass>org.dbpedia.extraction.dump.download.Download</mainClass>
>>>                             <!--
>>>                             <jvmArgs>
>>>                                 <jvmArg>-Dhttp.proxyHost=
>>> proxy.server.com</jvmArg>
>>>                                 <jvmArg>-Dhttp.proxyPort=80</jvmArg>
>>>
>>> <jvmArg>-Dhttp.nonProxyHosts="localhost|127.0.0.1"</jvmArg>
>>>                             </jvmArgs>
>>>                             -->
>>>                             <!-- ../run download
>>> config=download.properties -->
>>>                         </launcher>
>>>
>>>
>>> On Thu, Apr 25, 2013 at 5:29 AM, Rahul Sharnagat 
>>> <[email protected]>wrote:
>>>
>>>> Hi Jona,
>>>>      I think, i know the problem. I am on my institute network which
>>>> works through a proxy server. To get the maven working i had to set the
>>>> proxy settings in settings.xml and  provided it to mvn command but
>>>> currently i am putting in $HOME/.m2/ folder. Is downloading of wiki dump
>>>> accepts the maven proxy setting or global environment of http_proxy? May be
>>>> this can be the source of error. I will try to get on a no proxy network
>>>> and try it again.
>>>>
>>>>
>>>>
>>>> On Thu, Apr 25, 2013 at 4:09 AM, Jona Christopher Sahnwaldt <
>>>> [email protected]> wrote:
>>>>
>>>>> On 24 April 2013 20:55, Rahul Sharnagat <[email protected]> wrote:
>>>>> > Hi Dimitris,
>>>>> >     Since last few days, i am trying to understand the dataparser and
>>>>> > mapping code.I also went little higher in hierarchy to understand the
>>>>> > dependencies. Things are getting clear now but will take some more
>>>>> time to
>>>>> > understand all nuances. Also I successfully installed the extraction
>>>>> > framework.
>>>>> >     But there is one problem for getting the dump to work upon. As
>>>>> per
>>>>> > documentation (here and here), i could not find
>>>>> download.properties.file in
>>>>> > master branch in dump folder. But i explored the folder and found
>>>>> > download.minimal.properties. I tweaked it according to instructions
>>>>> for my
>>>>> > requirement but i am getting a error (attached is full debug log and
>>>>> tweaked
>>>>> > minimal.properties). I tried to find similar error in archived
>>>>> message but
>>>>> > could not find it. Can you help me in this regard ?
>>>>>
>>>>> Strange. Could you just try again? It works for me. Maybe it was a
>>>>> temporary problem at Wikimedia. Or maybe something is wrong with your
>>>>> network? What does http://dumps.wikimedia.org/enwiki/ look like in
>>>>> your browser?
>>>>>
>>>>> I updated extraction-framework to the latest version from GitHub,
>>>>> copied your download.minimal.properties file into my dump/ folder,
>>>>> changed the value of base-dir and executed
>>>>>
>>>>> ../clean-install-run download config=download.minimal.properties
>>>>>
>>>>> Below is an excerpt from the result.
>>>>>
>>>>> Cheers,
>>>>> JC
>>>>>
>>>>> [INFO] launcher 'download' selected =>
>>>>> org.dbpedia.extraction.dump.download.Download
>>>>> done: 0 -
>>>>> todo: 1 - wiki=en,locale=en
>>>>> downloading 'http://dumps.wikimedia.org/enwiki/' to
>>>>> '/Users/jcsahnwaldt/tmp/enwiki/index.html'
>>>>> read 3.6132812 KB of 3.6132812 KB in 0.014 seconds (258.0915 KB/s)
>>>>> downloading 'http://dumps.wikimedia.org/enwiki/20130403/' to
>>>>> '/Users/jcsahnwaldt/tmp/enwiki/20130403/index.html'
>>>>> read 102.23535 KB of 102.23535 KB in 0.907 seconds (112.71813 KB/s)
>>>>> date page 'http://dumps.wikimedia.org/enwiki/20130403/' has all files
>>>>> [pages-articles.xml.bz2]
>>>>> downloading '
>>>>> http://dumps.wikimedia.org/enwiki/20130403/enwiki-20130403-pages-articles.xml.bz2
>>>>> '
>>>>> to
>>>>> '/Users/jcsahnwaldt/tmp/enwiki/20130403/enwiki-20130403-pages-articles.xml.bz2'
>>>>>
>>>>>
>>>>> >     I am also reading Dbpedia mapping wiki to understand how
>>>>> ontology is
>>>>> > created and infobox to ontology mapping is done and relate it to
>>>>> code. Since
>>>>> > little more  than a week is left for final proposal, I want to
>>>>> create a good
>>>>> > draft by 1st. I will try to send a rough draft by tomorrow.
>>>>> >
>>>>> > Thanks.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Apr 23, 2013 at 11:58 AM, Rahul Sharnagat <
>>>>> [email protected]>
>>>>> > wrote:
>>>>> >>
>>>>> >> Thanks Dimitris.
>>>>> >> I will look into this issue and related code and  get back to you
>>>>> if i
>>>>> >> face any problems.
>>>>> >>
>>>>> >>
>>>>> >> On Mon, Apr 22, 2013 at 6:07 PM, Dimitris Kontokostas <
>>>>> [email protected]>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Hi Rahul,
>>>>> >>>
>>>>> >>> A very good warm-up task for this idea is issue #36
>>>>> >>> (https://github.com/dbpedia/extraction-framework/issues/36)
>>>>> >>> With this task you will get to know the parser internals and see
>>>>> the
>>>>> >>> actual need to crowd-source the rules.
>>>>> >>>
>>>>> >>> Take a first look and we'll be available for further details
>>>>> >>>
>>>>> >>> Cheers,
>>>>> >>> Dimitris
>>>>> >>>
>>>>> >>>
>>>>> >>> On Mon, Apr 22, 2013 at 5:02 AM, Rahul Sharnagat <
>>>>> [email protected]>
>>>>> >>> wrote:
>>>>> >>>>
>>>>> >>>> Sorry, forgot to add mailing list. Just hit the reply button. :)
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Mon, Apr 22, 2013 at 2:19 AM, Dimitris Kontokostas
>>>>> >>>> <[email protected]> wrote:
>>>>> >>>>>
>>>>> >>>>> Please put the mailing list in cc :)
>>>>> >>>>>
>>>>> >>>>> Cheers,
>>>>> >>>>> Dimitris
>>>>> >>>>>
>>>>> >>>>> ----
>>>>> >>>>> Send from my mobile
>>>>> >>>>>
>>>>> >>>>> Στις 21 Απρ 2013 7:55 μ.μ., ο χρήστης "Rahul Sharnagat"
>>>>> >>>>> <[email protected]> έγραψε:
>>>>> >>>>>
>>>>> >>>>>> Hi Dimitris,
>>>>> >>>>>>         Thanks for the reply.
>>>>> >>>>>>         I am looking for some warm up task relating to this
>>>>> idea . I
>>>>> >>>>>> have started reading about scala and Dbpedia. It should not
>>>>> take much time
>>>>> >>>>>> to get accustomed to scala since i have previously worked in
>>>>> haskell. Please
>>>>> >>>>>> give me some direction for a warm up task.
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> On Sun, Apr 21, 2013 at 9:39 PM, Dimitris Kontokostas
>>>>> >>>>>> <[email protected]> wrote:
>>>>> >>>>>>>
>>>>> >>>>>>> Hi Rahul,
>>>>> >>>>>>>
>>>>> >>>>>>> The application period did not start yet so there is still
>>>>> time left
>>>>> >>>>>>> :)
>>>>> >>>>>>>
>>>>> >>>>>>> Did you read the idea page [1]? The description is pretty big
>>>>> but you
>>>>> >>>>>>> can ask anything you don't understand completely.
>>>>> >>>>>>> Everything should be clear when you write your application ;)
>>>>> >>>>>>>
>>>>> >>>>>>> Best,
>>>>> >>>>>>> Dimitris
>>>>> >>>>>>>
>>>>> >>>>>>> [1]
>>>>> http://wiki.dbpedia.org/gsoc2013/ideas/CrowdsourceTestsAndRules
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> On Sun, Apr 21, 2013 at 4:06 PM, Rahul Sharnagat
>>>>> >>>>>>> <[email protected]> wrote:
>>>>> >>>>>>>>
>>>>> >>>>>>>> Hi Dimitris,
>>>>> >>>>>>>>
>>>>> >>>>>>>>     I am Rahul Sharnagat, master student at IIT Bombay. I am
>>>>> >>>>>>>> planning to apply for DBpedia GSoC project.
>>>>> >>>>>>>>
>>>>> >>>>>>>>     I am interested in the project, Crowdsource tests and
>>>>> extraction
>>>>> >>>>>>>> rules. I am working on Named entity Recognition(NER) and
>>>>> Entiity mining as
>>>>> >>>>>>>> my masters project. I think working with Dbpedia would help
>>>>> me a lot in
>>>>> >>>>>>>> that. I have interned at Yahoo last summer working on
>>>>> refining news indexes.
>>>>> >>>>>>>>
>>>>> >>>>>>>>     I know I am late due to my final exams, but it will be
>>>>> great if
>>>>> >>>>>>>> you can help me get started. I have been reading dbpedia
>>>>> wikipages, also
>>>>> >>>>>>>> have downloaded code from github.
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>> --
>>>>> >>>>>>>> Best Regards,
>>>>> >>>>>>>> Rahul Sharnagat
>>>>> >>>>>>>> CSE MTech, IITB
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> >>>>>>>> Precog is a next-generation analytics platform capable of
>>>>> advanced
>>>>> >>>>>>>> analytics on semi-structured data. The platform includes APIs
>>>>> for
>>>>> >>>>>>>> building
>>>>> >>>>>>>> apps and a phenomenal toolset for data science. Developers
>>>>> can use
>>>>> >>>>>>>> our toolset for easy data analysis & visualization. Get a free
>>>>> >>>>>>>> account!
>>>>> >>>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>>>> >>>>>>>> _______________________________________________
>>>>> >>>>>>>> Dbpedia-gsoc mailing list
>>>>> >>>>>>>> [email protected]
>>>>> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>>> >>>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> --
>>>>> >>>>>>> Kontokostas Dimitris
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>> Best Regards,
>>>>> >>>>>> Rahul Sharnagat
>>>>> >>>>>> CSE MTech, IITB
>>>>> >>>>>> H14, B505
>>>>> >>>>>> +91.9860.451.056
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> --
>>>>> >>>> Best Regards,
>>>>> >>>> Rahul Sharnagat
>>>>> >>>> CSE MTech, IITB
>>>>> >>>> H14, B505
>>>>> >>>> +91.9860.451.056
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> Kontokostas Dimitris
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Best Regards,
>>>>> >> Rahul Sharnagat
>>>>> >> CSE MTech, IITB
>>>>> >>
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Best Regards,
>>>>> > Rahul Sharnagat
>>>>> > CSE MTech, IITB
>>>>> > H14, B505
>>>>> > +91.9860.451.056
>>>>> >
>>>>> >
>>>>> ------------------------------------------------------------------------------
>>>>> > Try New Relic Now & We'll Send You this Cool Shirt
>>>>> > New Relic is the only SaaS-based application performance monitoring
>>>>> service
>>>>> > that delivers powerful full stack analytics. Optimize and monitor
>>>>> your
>>>>> > browser, app, & servers with just a few lines of code. Try New Relic
>>>>> > and get this awesome Nerd Life shirt!
>>>>> http://p.sf.net/sfu/newrelic_d2d_apr
>>>>> > _______________________________________________
>>>>> > Dbpedia-gsoc mailing list
>>>>> > [email protected]
>>>>> > https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Rahul Sharnagat
>>>> CSE MTech, IITB
>>>> H14, B505
>>>> +91.9860.451.056
>>>>
>>>
>>>
>>>
>>> --
>>> Kontokostas Dimitris
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Rahul Sharnagat
>> CSE MTech, IITB
>> H14, B505
>> +91.9860.451.056
>>
>
>
>
> --
> Best Regards,
> Rahul Sharnagat
> CSE MTech, IITB
> H14, B505
> +91.9860.451.056
>
>
> ------------------------------------------------------------------------------
> Try New Relic Now & We'll Send You this Cool Shirt
> New Relic is the only SaaS-based application performance monitoring service
> that delivers powerful full stack analytics. Optimize and monitor your
> browser, app, & servers with just a few lines of code. Try New Relic
> and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
> _______________________________________________
> Dbpedia-gsoc mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>
>

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr

_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Re: [Dbpedia-gsoc] GSoC : Crowdsource tests and extraction rules

Reply via email to