Hi Gaurav, I'm sorry - I don't have time to tell you exactly which parts of the code have to be changed. For someone familiar with Scala it should be fairly simple to figure it out. Maybe some other developers can help you, so I'm forwarding your mail to the list again.
Regards, JC On Thu, Mar 7, 2013 at 3:01 PM, gaurav pant <[email protected]> wrote: > Hi Jona, > > Thnx for your suggestion. > > I am not aware of entire architecture of extractor framework. Even I am not > aware of scala too. But if you tell me the filenames exactly than I could > make required changes. > > Till now I could understand that the parameter should be added into > extract_config file.But dont know which file will invoke the > "ExtractionJob.scala" file. > > Please guide me further.Thanks for all the help. > > > On Thu, Mar 7, 2013 at 5:06 PM, Jona Christopher Sahnwaldt <[email protected]> > wrote: >> >> Hi Gaurav, >> >> the simplest way to filter out unmodified pages is probably to add a >> filter in ExtractionJob.scala [1]. We don't yet have configurable >> filters, so you will have to modify the source code. You basically >> have to change this line: >> >> if (namespaces.contains(page.title.namespace)) { >> >> to something like >> >> if (page.timestamp >= minimalTimestamp && >> namespaces.contains(page.title.namespace)) { >> >> And of course you have to add boilerplate code that reads >> minimalTimestamp from the config file and passes it on to >> ExtractionJob. >> >> Cheers, >> JC >> >> [1] >> https://github.com/dbpedia/extraction-framework/blob/master/dump/src/main/scala/org/dbpedia/extraction/dump/extract/ExtractionJob.scala >> >> On Thu, Mar 7, 2013 at 7:47 AM, gaurav pant <[email protected]> wrote: >> > Hi All, >> > >> > Thanks Dimitris for your help.. >> > >> > I also want one more confirmation from you. >> > >> > I just gone through the code of InfoboxExtractor. There it seems me that >> > code is written to process data page by page.(<page>..</page>). If i >> > will >> > remove all those pages from "page-article" dump using some perl/python >> > script and than apply Infobox extraction or Abstract extraction than we >> > will >> > get only updated triplets as output like DBpedia Live for English. >> > >> > >> > Please correct me if I am wrong. >> > >> > Thanks >> > >> > >> > On Wed, Mar 6, 2013 at 5:51 PM, Dimitris Kontokostas <[email protected]> >> > wrote: >> >> >> >> Hi Guarav, >> >> >> >> You are correct! >> >> Cheers, >> >> Dimitris >> >> >> >> >> >> On Wed, Mar 6, 2013 at 2:05 PM, gaurav pant <[email protected]> wrote: >> >>> >> >>> Hi, >> >>> >> >>> greeting for the day..! >> >>> >> >>> I have extracted below lines from one of the "pages-articles" file >> >>> available at >> >>> >> >>> "http://en.wikipedia.org/wiki/Wikipedia:Database_download#Other_languages". >> >>> If I am not wrong than below red marked line denotes is the last >> >>> modified >> >>> timestamp of the page. Please correct me if I am wrong...! >> >>> >> >>> "<page> >> >>> <title>Alan Smithee</title> >> >>> <ns>0</ns> >> >>> <id>1</id> >> >>> <revision> >> >>> <id>114215698</id> >> >>> <parentid>114215658</parentid> >> >>> <timestamp>2013-02-14T21:00:17Z</timestamp> >> >>> <contributor> >> >>> <ip>2003:58:A507:6A01:1C37:DB74:A237:E121</ip> >> >>> </contributor> >> >>> <comment>/* Entstehung */</comment> >> >>> " >> >>> >> >>> -- >> >>> Regards >> >>> Gaurav Pant >> >>> +91-7709196607,+91-9405757794 >> >>> >> >>> >> >>> >> >>> ------------------------------------------------------------------------------ >> >>> Symantec Endpoint Protection 12 positioned as A LEADER in The >> >>> Forrester >> >>> Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in >> >>> the >> >>> endpoint security space. For insight on selecting the right partner to >> >>> tackle endpoint security challenges, access the full report. >> >>> http://p.sf.net/sfu/symantec-dev2dev >> >>> _______________________________________________ >> >>> Dbpedia-discussion mailing list >> >>> [email protected] >> >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> >>> >> >> >> >> >> >> >> >> -- >> >> Kontokostas Dimitris >> > >> > >> > >> > >> > -- >> > Regards >> > Gaurav Pant >> > +91-7709196607,+91-9405757794 >> > >> > >> > ------------------------------------------------------------------------------ >> > Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester >> > Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the >> > endpoint security space. For insight on selecting the right partner to >> > tackle endpoint security challenges, access the full report. >> > http://p.sf.net/sfu/symantec-dev2dev >> > _______________________________________________ >> > Dbpedia-discussion mailing list >> > [email protected] >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> > > > > > > -- > Regards > Gaurav Pant > +91-7709196607,+91-9405757794 ------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
