Thank you all for the answers.

My initial idea was to save time since I need to run the extractor only 
for Wikipages of People. As it is not possible, I will run the extractor 
on the entire dump and after select the triples of class People using 
Sparql.

Best,
Samur
On 9/17/13 12:15 PM, Jona Christopher Sahnwaldt wrote:
> On 17 September 2013 11:55, Samur Araujo <[email protected]> 
> wrote:
>> Hi Andrea,  I would like to skip the parsing of wikipages that does not fit
>> into the class that I selected.
>>
>> If a page needs to be parsed to obtain its class,
> It does, so unfortunately you can't skip the parsing.
>
>> then this filtering at
>> extraction time should work for all extractors. Why should it differ from
>> extractor to extractor?
> I don't know what you mean.
>
>> Anyway, do you know any way to do it for the LabelsExtractor, for example?
> The easiest way is probably to post-process the datasets with some
> bash commands: take the instances_types file, grep only the lines that
> contain your class, let awk (or maybe the 'cut' command) strip
> everything but the subject. Then you have a list of items that all
> have the desired type. With this list, you can process other datasets,
> either with awk using hash sets, or maybe the 'join' command will be
> enough.
>
> The parsing is the most expensive step of the extraction. I'm not sure
> if it's BZip2 decompression or XML parsing or Wikitext parsing. But
> anyway, once you have parsed a page into the DBpedia Wikitext AST, it
> doesn't matter much anymore how many extractors you run. And since you
> can't skip the parsing, there's not much point in skipping some
> extractors.
>
> JC
>
>> Best,
>> Samur
>>
>> On 9/16/13 6:03 PM, Andrea Di Menna wrote:
>>
>> Hi Samur,
>>
>> which extractors would you like to run with those restrictions?
>> I am asking this because I think only some of the extractors would be able
>> to apply the rules as you are describing them at extraction phase.
>>
>> Cheers
>> Andrea
>>
>>
>> 2013/9/16 Samur Araujo <[email protected]>
>>> Dear DBPedians, I would like to run the DBPedia extractor framework only
>>> on one
>>> class (and its subclasses) of resources (e.g. People).
>>>
>>>     If I understood well, currently I can only run the extractor in the
>>> entire dump that I have download. I would like to output the triples
>>> only for a specific class (e.g. People). Is it possible?
>>>
>>> Assuming my class of interest is People,  the extractor should also
>>> parse Musician, assuming Musician as subclass of People.
>>>
>>> Does anyone have ever tried this?
>>>
>>> Best,
>>>
>>> --
>>> M.Sc. Samur Araujo
>>> Ph.D Student
>>> TU Delft - Delft University of Technology - EWI
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>>> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
>>> SharePoint
>>> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
>>> includes
>>> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Dbpedia-developers mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>
>>
>>
>> --
>> M.Sc. Samur Araujo
>> Ph.D Student
>> TU Delft - Delft University of Technology - EWI
>>
>>
>> ------------------------------------------------------------------------------
>> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
>> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
>> includes
>> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Dbpedia-developers mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>


-- 
M.Sc. Samur Araujo
Ph.D Student
TU Delft - Delft University of Technology - EWI


------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to