Sorry, gain forgot to add gsoc mailing list :)
On Sat, Apr 27, 2013 at 1:53 PM, Rahul Sharnagat <[email protected]>wrote:
> Hi Christopher, Dimitris
> Thanks Dimitris for the proxy help. Dump download went easily. But
> while running the extraction with extraction.default.properties. there was
> error for missing wikipedias.csv. I went through the mail archieve and
> found this solution
> here<http://www.mail-archive.com/[email protected]/msg03921.html>.
> Changed languages to en instead of 10000- . New error has occured for which
> i have attached the logs. it is trying to find arwiki when there is none in
> base-dir.
>
> *Regarding proposal*
>
> I have started writing my proposal and am facing some problems
> relating to proposing a tentative solution. Since there are three objective
> for this idea . I am providing what abstract thought i have for each problem
>
> - Extending the DBpedia mapping wiki so that the editors could provide
> the rules for the data format that need to extracted
>
> I looked into the code of dataparser . Currently, let say for month
> information, config file extraction.config.dataparser defines how months
> in each language need to be parsed .So the solution to the problem would be
> to define a module that stores and access and specify the rules for this
> info instead of writing scala code. Is this what is expected from this task
> ? As christopher mentioned in issue #36 bulding a DSL could be solution.
>
> - Moving data types from extraction code to the mapping wiki ontology
>
> Seems easy to do. I just didn't find the code that modifies wiki. Or how
> mapping wiki extracts info from this code . (not very clear on this, need
> pointers
> in code)
>
> - Extend mapping wiki to include tests could be specified on wiki page
> so that the community can contribute
>
> Do i need to elaborate extensively on what kind of test should i be
> implementing. It would be very helpful if you can elaborate on testing. I
> understand that we need a module that take input from mapping wiki users
> for a particular language and a way that defines these tests and validate
> the extraction result. Can give me some pointer regarding implementation of
> test cases?
>
> Thanks
>
>
>
> On Thu, Apr 25, 2013 at 12:19 PM, Dimitris Kontokostas
> <[email protected]>wrote:
>
>> Hi Rahul,
>>
>> You should put your main effort in your application but I think this task
>> will also help you get a better idea on what to expect.
>>
>> Regarding the proxy, we have the following launcher in dump/pom.xml,
>> please uncomment and adappt the proxy settings
>> <launcher>
>> <id>download</id>
>>
>> <mainClass>org.dbpedia.extraction.dump.download.Download</mainClass>
>> <!--
>> <jvmArgs>
>> <jvmArg>-Dhttp.proxyHost=proxy.server.com
>> </jvmArg>
>> <jvmArg>-Dhttp.proxyPort=80</jvmArg>
>>
>> <jvmArg>-Dhttp.nonProxyHosts="localhost|127.0.0.1"</jvmArg>
>> </jvmArgs>
>> -->
>> <!-- ../run download
>> config=download.properties -->
>> </launcher>
>>
>>
>> On Thu, Apr 25, 2013 at 5:29 AM, Rahul Sharnagat
>> <[email protected]>wrote:
>>
>>> Hi Jona,
>>> I think, i know the problem. I am on my institute network which
>>> works through a proxy server. To get the maven working i had to set the
>>> proxy settings in settings.xml and provided it to mvn command but
>>> currently i am putting in $HOME/.m2/ folder. Is downloading of wiki dump
>>> accepts the maven proxy setting or global environment of http_proxy? May be
>>> this can be the source of error. I will try to get on a no proxy network
>>> and try it again.
>>>
>>>
>>>
>>> On Thu, Apr 25, 2013 at 4:09 AM, Jona Christopher Sahnwaldt <
>>> [email protected]> wrote:
>>>
>>>> On 24 April 2013 20:55, Rahul Sharnagat <[email protected]> wrote:
>>>> > Hi Dimitris,
>>>> > Since last few days, i am trying to understand the dataparser and
>>>> > mapping code.I also went little higher in hierarchy to understand the
>>>> > dependencies. Things are getting clear now but will take some more
>>>> time to
>>>> > understand all nuances. Also I successfully installed the extraction
>>>> > framework.
>>>> > But there is one problem for getting the dump to work upon. As per
>>>> > documentation (here and here), i could not find
>>>> download.properties.file in
>>>> > master branch in dump folder. But i explored the folder and found
>>>> > download.minimal.properties. I tweaked it according to instructions
>>>> for my
>>>> > requirement but i am getting a error (attached is full debug log and
>>>> tweaked
>>>> > minimal.properties). I tried to find similar error in archived
>>>> message but
>>>> > could not find it. Can you help me in this regard ?
>>>>
>>>> Strange. Could you just try again? It works for me. Maybe it was a
>>>> temporary problem at Wikimedia. Or maybe something is wrong with your
>>>> network? What does http://dumps.wikimedia.org/enwiki/ look like in
>>>> your browser?
>>>>
>>>> I updated extraction-framework to the latest version from GitHub,
>>>> copied your download.minimal.properties file into my dump/ folder,
>>>> changed the value of base-dir and executed
>>>>
>>>> ../clean-install-run download config=download.minimal.properties
>>>>
>>>> Below is an excerpt from the result.
>>>>
>>>> Cheers,
>>>> JC
>>>>
>>>> [INFO] launcher 'download' selected =>
>>>> org.dbpedia.extraction.dump.download.Download
>>>> done: 0 -
>>>> todo: 1 - wiki=en,locale=en
>>>> downloading 'http://dumps.wikimedia.org/enwiki/' to
>>>> '/Users/jcsahnwaldt/tmp/enwiki/index.html'
>>>> read 3.6132812 KB of 3.6132812 KB in 0.014 seconds (258.0915 KB/s)
>>>> downloading 'http://dumps.wikimedia.org/enwiki/20130403/' to
>>>> '/Users/jcsahnwaldt/tmp/enwiki/20130403/index.html'
>>>> read 102.23535 KB of 102.23535 KB in 0.907 seconds (112.71813 KB/s)
>>>> date page 'http://dumps.wikimedia.org/enwiki/20130403/' has all files
>>>> [pages-articles.xml.bz2]
>>>> downloading '
>>>> http://dumps.wikimedia.org/enwiki/20130403/enwiki-20130403-pages-articles.xml.bz2
>>>> '
>>>> to
>>>> '/Users/jcsahnwaldt/tmp/enwiki/20130403/enwiki-20130403-pages-articles.xml.bz2'
>>>>
>>>>
>>>> > I am also reading Dbpedia mapping wiki to understand how ontology
>>>> is
>>>> > created and infobox to ontology mapping is done and relate it to
>>>> code. Since
>>>> > little more than a week is left for final proposal, I want to create
>>>> a good
>>>> > draft by 1st. I will try to send a rough draft by tomorrow.
>>>> >
>>>> > Thanks.
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Apr 23, 2013 at 11:58 AM, Rahul Sharnagat <
>>>> [email protected]>
>>>> > wrote:
>>>> >>
>>>> >> Thanks Dimitris.
>>>> >> I will look into this issue and related code and get back to you if
>>>> i
>>>> >> face any problems.
>>>> >>
>>>> >>
>>>> >> On Mon, Apr 22, 2013 at 6:07 PM, Dimitris Kontokostas <
>>>> [email protected]>
>>>> >> wrote:
>>>> >>>
>>>> >>> Hi Rahul,
>>>> >>>
>>>> >>> A very good warm-up task for this idea is issue #36
>>>> >>> (https://github.com/dbpedia/extraction-framework/issues/36)
>>>> >>> With this task you will get to know the parser internals and see the
>>>> >>> actual need to crowd-source the rules.
>>>> >>>
>>>> >>> Take a first look and we'll be available for further details
>>>> >>>
>>>> >>> Cheers,
>>>> >>> Dimitris
>>>> >>>
>>>> >>>
>>>> >>> On Mon, Apr 22, 2013 at 5:02 AM, Rahul Sharnagat <
>>>> [email protected]>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> Sorry, forgot to add mailing list. Just hit the reply button. :)
>>>> >>>>
>>>> >>>>
>>>> >>>> On Mon, Apr 22, 2013 at 2:19 AM, Dimitris Kontokostas
>>>> >>>> <[email protected]> wrote:
>>>> >>>>>
>>>> >>>>> Please put the mailing list in cc :)
>>>> >>>>>
>>>> >>>>> Cheers,
>>>> >>>>> Dimitris
>>>> >>>>>
>>>> >>>>> ----
>>>> >>>>> Send from my mobile
>>>> >>>>>
>>>> >>>>> Στις 21 Απρ 2013 7:55 μ.μ., ο χρήστης "Rahul Sharnagat"
>>>> >>>>> <[email protected]> έγραψε:
>>>> >>>>>
>>>> >>>>>> Hi Dimitris,
>>>> >>>>>> Thanks for the reply.
>>>> >>>>>> I am looking for some warm up task relating to this idea
>>>> . I
>>>> >>>>>> have started reading about scala and Dbpedia. It should not take
>>>> much time
>>>> >>>>>> to get accustomed to scala since i have previously worked in
>>>> haskell. Please
>>>> >>>>>> give me some direction for a warm up task.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On Sun, Apr 21, 2013 at 9:39 PM, Dimitris Kontokostas
>>>> >>>>>> <[email protected]> wrote:
>>>> >>>>>>>
>>>> >>>>>>> Hi Rahul,
>>>> >>>>>>>
>>>> >>>>>>> The application period did not start yet so there is still time
>>>> left
>>>> >>>>>>> :)
>>>> >>>>>>>
>>>> >>>>>>> Did you read the idea page [1]? The description is pretty big
>>>> but you
>>>> >>>>>>> can ask anything you don't understand completely.
>>>> >>>>>>> Everything should be clear when you write your application ;)
>>>> >>>>>>>
>>>> >>>>>>> Best,
>>>> >>>>>>> Dimitris
>>>> >>>>>>>
>>>> >>>>>>> [1]
>>>> http://wiki.dbpedia.org/gsoc2013/ideas/CrowdsourceTestsAndRules
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> On Sun, Apr 21, 2013 at 4:06 PM, Rahul Sharnagat
>>>> >>>>>>> <[email protected]> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>> Hi Dimitris,
>>>> >>>>>>>>
>>>> >>>>>>>> I am Rahul Sharnagat, master student at IIT Bombay. I am
>>>> >>>>>>>> planning to apply for DBpedia GSoC project.
>>>> >>>>>>>>
>>>> >>>>>>>> I am interested in the project, Crowdsource tests and
>>>> extraction
>>>> >>>>>>>> rules. I am working on Named entity Recognition(NER) and
>>>> Entiity mining as
>>>> >>>>>>>> my masters project. I think working with Dbpedia would help me
>>>> a lot in
>>>> >>>>>>>> that. I have interned at Yahoo last summer working on refining
>>>> news indexes.
>>>> >>>>>>>>
>>>> >>>>>>>> I know I am late due to my final exams, but it will be
>>>> great if
>>>> >>>>>>>> you can help me get started. I have been reading dbpedia
>>>> wikipages, also
>>>> >>>>>>>> have downloaded code from github.
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>> --
>>>> >>>>>>>> Best Regards,
>>>> >>>>>>>> Rahul Sharnagat
>>>> >>>>>>>> CSE MTech, IITB
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> ------------------------------------------------------------------------------
>>>> >>>>>>>> Precog is a next-generation analytics platform capable of
>>>> advanced
>>>> >>>>>>>> analytics on semi-structured data. The platform includes APIs
>>>> for
>>>> >>>>>>>> building
>>>> >>>>>>>> apps and a phenomenal toolset for data science. Developers can
>>>> use
>>>> >>>>>>>> our toolset for easy data analysis & visualization. Get a free
>>>> >>>>>>>> account!
>>>> >>>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>>> >>>>>>>> _______________________________________________
>>>> >>>>>>>> Dbpedia-gsoc mailing list
>>>> >>>>>>>> [email protected]
>>>> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> --
>>>> >>>>>>> Kontokostas Dimitris
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>> Best Regards,
>>>> >>>>>> Rahul Sharnagat
>>>> >>>>>> CSE MTech, IITB
>>>> >>>>>> H14, B505
>>>> >>>>>> +91.9860.451.056
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> --
>>>> >>>> Best Regards,
>>>> >>>> Rahul Sharnagat
>>>> >>>> CSE MTech, IITB
>>>> >>>> H14, B505
>>>> >>>> +91.9860.451.056
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Kontokostas Dimitris
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Best Regards,
>>>> >> Rahul Sharnagat
>>>> >> CSE MTech, IITB
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Best Regards,
>>>> > Rahul Sharnagat
>>>> > CSE MTech, IITB
>>>> > H14, B505
>>>> > +91.9860.451.056
>>>> >
>>>> >
>>>> ------------------------------------------------------------------------------
>>>> > Try New Relic Now & We'll Send You this Cool Shirt
>>>> > New Relic is the only SaaS-based application performance monitoring
>>>> service
>>>> > that delivers powerful full stack analytics. Optimize and monitor your
>>>> > browser, app, & servers with just a few lines of code. Try New Relic
>>>> > and get this awesome Nerd Life shirt!
>>>> http://p.sf.net/sfu/newrelic_d2d_apr
>>>> > _______________________________________________
>>>> > Dbpedia-gsoc mailing list
>>>> > [email protected]
>>>> > https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Rahul Sharnagat
>>> CSE MTech, IITB
>>> H14, B505
>>> +91.9860.451.056
>>>
>>
>>
>>
>> --
>> Kontokostas Dimitris
>>
>
>
>
> --
> Best Regards,
> Rahul Sharnagat
> CSE MTech, IITB
> H14, B505
> +91.9860.451.056
>
--
Best Regards,
Rahul Sharnagat
CSE MTech, IITB
H14, B505
+91.9860.451.056
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc