Thank you all for the answers. My initial idea was to save time since I need to run the extractor only for Wikipages of People. As it is not possible, I will run the extractor on the entire dump and after select the triples of class People using Sparql.
Best, Samur On 9/17/13 12:15 PM, Jona Christopher Sahnwaldt wrote: > On 17 September 2013 11:55, Samur Araujo <[email protected]> > wrote: >> Hi Andrea, I would like to skip the parsing of wikipages that does not fit >> into the class that I selected. >> >> If a page needs to be parsed to obtain its class, > It does, so unfortunately you can't skip the parsing. > >> then this filtering at >> extraction time should work for all extractors. Why should it differ from >> extractor to extractor? > I don't know what you mean. > >> Anyway, do you know any way to do it for the LabelsExtractor, for example? > The easiest way is probably to post-process the datasets with some > bash commands: take the instances_types file, grep only the lines that > contain your class, let awk (or maybe the 'cut' command) strip > everything but the subject. Then you have a list of items that all > have the desired type. With this list, you can process other datasets, > either with awk using hash sets, or maybe the 'join' command will be > enough. > > The parsing is the most expensive step of the extraction. I'm not sure > if it's BZip2 decompression or XML parsing or Wikitext parsing. But > anyway, once you have parsed a page into the DBpedia Wikitext AST, it > doesn't matter much anymore how many extractors you run. And since you > can't skip the parsing, there's not much point in skipping some > extractors. > > JC > >> Best, >> Samur >> >> On 9/16/13 6:03 PM, Andrea Di Menna wrote: >> >> Hi Samur, >> >> which extractors would you like to run with those restrictions? >> I am asking this because I think only some of the extractors would be able >> to apply the rules as you are describing them at extraction phase. >> >> Cheers >> Andrea >> >> >> 2013/9/16 Samur Araujo <[email protected]> >>> Dear DBPedians, I would like to run the DBPedia extractor framework only >>> on one >>> class (and its subclasses) of resources (e.g. People). >>> >>> If I understood well, currently I can only run the extractor in the >>> entire dump that I have download. I would like to output the triples >>> only for a specific class (e.g. People). Is it possible? >>> >>> Assuming my class of interest is People, the extractor should also >>> parse Musician, assuming Musician as subclass of People. >>> >>> Does anyone have ever tried this? >>> >>> Best, >>> >>> -- >>> M.Sc. Samur Araujo >>> Ph.D Student >>> TU Delft - Delft University of Technology - EWI >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! >>> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, >>> SharePoint >>> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack >>> includes >>> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Dbpedia-developers mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers >> >> >> >> -- >> M.Sc. Samur Araujo >> Ph.D Student >> TU Delft - Delft University of Technology - EWI >> >> >> ------------------------------------------------------------------------------ >> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! >> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint >> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack >> includes >> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. >> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk >> _______________________________________________ >> Dbpedia-developers mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers >> -- M.Sc. Samur Araujo Ph.D Student TU Delft - Delft University of Technology - EWI ------------------------------------------------------------------------------ LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk _______________________________________________ Dbpedia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
