On 17 September 2013 11:55, Samur Araujo <[email protected]> wrote:
> Hi Andrea,  I would like to skip the parsing of wikipages that does not fit
> into the class that I selected.
>
> If a page needs to be parsed to obtain its class,

It does, so unfortunately you can't skip the parsing.

> then this filtering at
> extraction time should work for all extractors. Why should it differ from
> extractor to extractor?

I don't know what you mean.

>
> Anyway, do you know any way to do it for the LabelsExtractor, for example?

The easiest way is probably to post-process the datasets with some
bash commands: take the instances_types file, grep only the lines that
contain your class, let awk (or maybe the 'cut' command) strip
everything but the subject. Then you have a list of items that all
have the desired type. With this list, you can process other datasets,
either with awk using hash sets, or maybe the 'join' command will be
enough.

The parsing is the most expensive step of the extraction. I'm not sure
if it's BZip2 decompression or XML parsing or Wikitext parsing. But
anyway, once you have parsed a page into the DBpedia Wikitext AST, it
doesn't matter much anymore how many extractors you run. And since you
can't skip the parsing, there's not much point in skipping some
extractors.

JC

> Best,
> Samur
>
> On 9/16/13 6:03 PM, Andrea Di Menna wrote:
>
> Hi Samur,
>
> which extractors would you like to run with those restrictions?
> I am asking this because I think only some of the extractors would be able
> to apply the rules as you are describing them at extraction phase.
>
> Cheers
> Andrea
>
>
> 2013/9/16 Samur Araujo <[email protected]>
>>
>> Dear DBPedians, I would like to run the DBPedia extractor framework only
>> on one
>> class (and its subclasses) of resources (e.g. People).
>>
>>    If I understood well, currently I can only run the extractor in the
>> entire dump that I have download. I would like to output the triples
>> only for a specific class (e.g. People). Is it possible?
>>
>> Assuming my class of interest is People,  the extractor should also
>> parse Musician, assuming Musician as subclass of People.
>>
>> Does anyone have ever tried this?
>>
>> Best,
>>
>> --
>> M.Sc. Samur Araujo
>> Ph.D Student
>> TU Delft - Delft University of Technology - EWI
>>
>>
>>
>> ------------------------------------------------------------------------------
>> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
>> SharePoint
>> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
>> includes
>> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Dbpedia-developers mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>
>
>
>
> --
> M.Sc. Samur Araujo
> Ph.D Student
> TU Delft - Delft University of Technology - EWI
>
>
> ------------------------------------------------------------------------------
> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
> includes
> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
> _______________________________________________
> Dbpedia-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>

------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to