Hi Jona,
I earlier attached those files. But because of my habit hitting just
reply it went only to Dimitris. Please also give me some pointer for the
doubts mentioned below regarding proposal.---------- Forwarded message ---------- From: Rahul Sharnagat <[email protected]> Date: Sat, Apr 27, 2013 at 1:53 PM Subject: Re: [Dbpedia-gsoc] GSoC : Crowdsource tests and extraction rules To: Dimitris Kontokostas <[email protected]> Hi Christopher, Dimitris Thanks Dimitris for the proxy help. Dump download went easily. But while running the extraction with extraction.default.properties. there was error for missing wikipedias.csv. I went through the mail archieve and found this solution here<http://www.mail-archive.com/[email protected]/msg03921.html>. Changed languages to en instead of 10000- . New error has occured for which i have attached the logs. it is trying to find arwiki when there is none in base-dir. *Regarding proposal* I have started writing my proposal and am facing some problems relating to proposing a tentative solution. Since there are three objective for this idea . I am providing what abstract thought i have for each problem - Extending the DBpedia mapping wiki so that the editors could provide the rules for the data format that need to extracted I looked into the code of dataparser . Currently, let say for month information, config file extraction.config.dataparser defines how months in each language need to be parsed .So the solution to the problem would be to define a module that stores and access and specify the rules for this info instead of writing scala code. Is this what is expected from this task ? As christopher mentioned in issue #36 bulding a DSL could be solution. - Moving data types from extraction code to the mapping wiki ontology Seems easy to do. I just didn't find the code that modifies wiki. Or how mapping wiki extracts info from this code . (not very clear on this, need some pointers in code) - Extend mapping wiki to include tests could be specified on wiki page so that the community can contribute Do i need to elaborate extensively on what kind of test should i be implementing. It would be very helpful if you can elaborate on testing. I understand that we need a module that take input from mapping wiki users for a particular language and a way that defines these tests and validate the extraction result. Can give me some pointer regarding implementation of test cases? Thanks On Thu, Apr 25, 2013 at 12:19 PM, Dimitris Kontokostas <[email protected]>wrote: > Hi Rahul, > > You should put your main effort in your application but I think this task > will also help you get a better idea on what to expect. > > Regarding the proxy, we have the following launcher in dump/pom.xml, > please uncomment and adappt the proxy settings > <launcher> > <id>download</id> > > <mainClass>org.dbpedia.extraction.dump.download.Download</mainClass> > <!-- > <jvmArgs> > <jvmArg>-Dhttp.proxyHost=proxy.server.com > </jvmArg> > <jvmArg>-Dhttp.proxyPort=80</jvmArg> > > <jvmArg>-Dhttp.nonProxyHosts="localhost|127.0.0.1"</jvmArg> > </jvmArgs> > --> > <!-- ../run download > config=download.properties --> > </launcher> > > > On Thu, Apr 25, 2013 at 5:29 AM, Rahul Sharnagat <[email protected]>wrote: > >> Hi Jona, >> I think, i know the problem. I am on my institute network which >> works through a proxy server. To get the maven working i had to set the >> proxy settings in settings.xml and provided it to mvn command but >> currently i am putting in $HOME/.m2/ folder. Is downloading of wiki dump >> accepts the maven proxy setting or global environment of http_proxy? May be >> this can be the source of error. I will try to get on a no proxy network >> and try it again. >> >> >> >> On Thu, Apr 25, 2013 at 4:09 AM, Jona Christopher Sahnwaldt < >> [email protected]> wrote: >> >>> On 24 April 2013 20:55, Rahul Sharnagat <[email protected]> wrote: >>> > Hi Dimitris, >>> > Since last few days, i am trying to understand the dataparser and >>> > mapping code.I also went little higher in hierarchy to understand the >>> > dependencies. Things are getting clear now but will take some more >>> time to >>> > understand all nuances. Also I successfully installed the extraction >>> > framework. >>> > But there is one problem for getting the dump to work upon. As per >>> > documentation (here and here), i could not find >>> download.properties.file in >>> > master branch in dump folder. But i explored the folder and found >>> > download.minimal.properties. I tweaked it according to instructions >>> for my >>> > requirement but i am getting a error (attached is full debug log and >>> tweaked >>> > minimal.properties). I tried to find similar error in archived message >>> but >>> > could not find it. Can you help me in this regard ? >>> >>> Strange. Could you just try again? It works for me. Maybe it was a >>> temporary problem at Wikimedia. Or maybe something is wrong with your >>> network? What does http://dumps.wikimedia.org/enwiki/ look like in >>> your browser? >>> >>> I updated extraction-framework to the latest version from GitHub, >>> copied your download.minimal.properties file into my dump/ folder, >>> changed the value of base-dir and executed >>> >>> ../clean-install-run download config=download.minimal.properties >>> >>> Below is an excerpt from the result. >>> >>> Cheers, >>> JC >>> >>> [INFO] launcher 'download' selected => >>> org.dbpedia.extraction.dump.download.Download >>> done: 0 - >>> todo: 1 - wiki=en,locale=en >>> downloading 'http://dumps.wikimedia.org/enwiki/' to >>> '/Users/jcsahnwaldt/tmp/enwiki/index.html' >>> read 3.6132812 KB of 3.6132812 KB in 0.014 seconds (258.0915 KB/s) >>> downloading 'http://dumps.wikimedia.org/enwiki/20130403/' to >>> '/Users/jcsahnwaldt/tmp/enwiki/20130403/index.html' >>> read 102.23535 KB of 102.23535 KB in 0.907 seconds (112.71813 KB/s) >>> date page 'http://dumps.wikimedia.org/enwiki/20130403/' has all files >>> [pages-articles.xml.bz2] >>> downloading ' >>> http://dumps.wikimedia.org/enwiki/20130403/enwiki-20130403-pages-articles.xml.bz2 >>> ' >>> to >>> '/Users/jcsahnwaldt/tmp/enwiki/20130403/enwiki-20130403-pages-articles.xml.bz2' >>> >>> >>> > I am also reading Dbpedia mapping wiki to understand how ontology >>> is >>> > created and infobox to ontology mapping is done and relate it to code. >>> Since >>> > little more than a week is left for final proposal, I want to create >>> a good >>> > draft by 1st. I will try to send a rough draft by tomorrow. >>> > >>> > Thanks. >>> > >>> > >>> > >>> > On Tue, Apr 23, 2013 at 11:58 AM, Rahul Sharnagat < >>> [email protected]> >>> > wrote: >>> >> >>> >> Thanks Dimitris. >>> >> I will look into this issue and related code and get back to you if i >>> >> face any problems. >>> >> >>> >> >>> >> On Mon, Apr 22, 2013 at 6:07 PM, Dimitris Kontokostas < >>> [email protected]> >>> >> wrote: >>> >>> >>> >>> Hi Rahul, >>> >>> >>> >>> A very good warm-up task for this idea is issue #36 >>> >>> (https://github.com/dbpedia/extraction-framework/issues/36) >>> >>> With this task you will get to know the parser internals and see the >>> >>> actual need to crowd-source the rules. >>> >>> >>> >>> Take a first look and we'll be available for further details >>> >>> >>> >>> Cheers, >>> >>> Dimitris >>> >>> >>> >>> >>> >>> On Mon, Apr 22, 2013 at 5:02 AM, Rahul Sharnagat < >>> [email protected]> >>> >>> wrote: >>> >>>> >>> >>>> Sorry, forgot to add mailing list. Just hit the reply button. :) >>> >>>> >>> >>>> >>> >>>> On Mon, Apr 22, 2013 at 2:19 AM, Dimitris Kontokostas >>> >>>> <[email protected]> wrote: >>> >>>>> >>> >>>>> Please put the mailing list in cc :) >>> >>>>> >>> >>>>> Cheers, >>> >>>>> Dimitris >>> >>>>> >>> >>>>> ---- >>> >>>>> Send from my mobile >>> >>>>> >>> >>>>> Στις 21 Απρ 2013 7:55 μ.μ., ο χρήστης "Rahul Sharnagat" >>> >>>>> <[email protected]> έγραψε: >>> >>>>> >>> >>>>>> Hi Dimitris, >>> >>>>>> Thanks for the reply. >>> >>>>>> I am looking for some warm up task relating to this idea >>> . I >>> >>>>>> have started reading about scala and Dbpedia. It should not take >>> much time >>> >>>>>> to get accustomed to scala since i have previously worked in >>> haskell. Please >>> >>>>>> give me some direction for a warm up task. >>> >>>>>> >>> >>>>>> >>> >>>>>> On Sun, Apr 21, 2013 at 9:39 PM, Dimitris Kontokostas >>> >>>>>> <[email protected]> wrote: >>> >>>>>>> >>> >>>>>>> Hi Rahul, >>> >>>>>>> >>> >>>>>>> The application period did not start yet so there is still time >>> left >>> >>>>>>> :) >>> >>>>>>> >>> >>>>>>> Did you read the idea page [1]? The description is pretty big >>> but you >>> >>>>>>> can ask anything you don't understand completely. >>> >>>>>>> Everything should be clear when you write your application ;) >>> >>>>>>> >>> >>>>>>> Best, >>> >>>>>>> Dimitris >>> >>>>>>> >>> >>>>>>> [1] >>> http://wiki.dbpedia.org/gsoc2013/ideas/CrowdsourceTestsAndRules >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> On Sun, Apr 21, 2013 at 4:06 PM, Rahul Sharnagat >>> >>>>>>> <[email protected]> wrote: >>> >>>>>>>> >>> >>>>>>>> Hi Dimitris, >>> >>>>>>>> >>> >>>>>>>> I am Rahul Sharnagat, master student at IIT Bombay. I am >>> >>>>>>>> planning to apply for DBpedia GSoC project. >>> >>>>>>>> >>> >>>>>>>> I am interested in the project, Crowdsource tests and >>> extraction >>> >>>>>>>> rules. I am working on Named entity Recognition(NER) and >>> Entiity mining as >>> >>>>>>>> my masters project. I think working with Dbpedia would help me >>> a lot in >>> >>>>>>>> that. I have interned at Yahoo last summer working on refining >>> news indexes. >>> >>>>>>>> >>> >>>>>>>> I know I am late due to my final exams, but it will be >>> great if >>> >>>>>>>> you can help me get started. I have been reading dbpedia >>> wikipages, also >>> >>>>>>>> have downloaded code from github. >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> -- >>> >>>>>>>> Best Regards, >>> >>>>>>>> Rahul Sharnagat >>> >>>>>>>> CSE MTech, IITB >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> ------------------------------------------------------------------------------ >>> >>>>>>>> Precog is a next-generation analytics platform capable of >>> advanced >>> >>>>>>>> analytics on semi-structured data. The platform includes APIs >>> for >>> >>>>>>>> building >>> >>>>>>>> apps and a phenomenal toolset for data science. Developers can >>> use >>> >>>>>>>> our toolset for easy data analysis & visualization. Get a free >>> >>>>>>>> account! >>> >>>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter >>> >>>>>>>> _______________________________________________ >>> >>>>>>>> Dbpedia-gsoc mailing list >>> >>>>>>>> [email protected] >>> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc >>> >>>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> -- >>> >>>>>>> Kontokostas Dimitris >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> -- >>> >>>>>> Best Regards, >>> >>>>>> Rahul Sharnagat >>> >>>>>> CSE MTech, IITB >>> >>>>>> H14, B505 >>> >>>>>> +91.9860.451.056 >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Best Regards, >>> >>>> Rahul Sharnagat >>> >>>> CSE MTech, IITB >>> >>>> H14, B505 >>> >>>> +91.9860.451.056 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Kontokostas Dimitris >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Best Regards, >>> >> Rahul Sharnagat >>> >> CSE MTech, IITB >>> >> >>> > >>> > >>> > >>> > -- >>> > Best Regards, >>> > Rahul Sharnagat >>> > CSE MTech, IITB >>> > H14, B505 >>> > +91.9860.451.056 >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Try New Relic Now & We'll Send You this Cool Shirt >>> > New Relic is the only SaaS-based application performance monitoring >>> service >>> > that delivers powerful full stack analytics. Optimize and monitor your >>> > browser, app, & servers with just a few lines of code. Try New Relic >>> > and get this awesome Nerd Life shirt! >>> http://p.sf.net/sfu/newrelic_d2d_apr >>> > _______________________________________________ >>> > Dbpedia-gsoc mailing list >>> > [email protected] >>> > https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc >>> > >>> >> >> >> >> -- >> Best Regards, >> Rahul Sharnagat >> CSE MTech, IITB >> H14, B505 >> +91.9860.451.056 >> > > > > -- > Kontokostas Dimitris > -- Best Regards, Rahul Sharnagat CSE MTech, IITB H14, B505 +91.9860.451.056 -- Best Regards, Rahul Sharnagat CSE MTech, IITB H14, B505 +91.9860.451.056
extraction.default.properties
Description: Binary data
ext_dump
Description: Binary data
------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________ Dbpedia-gsoc mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
