sarif ishak wrote:
> Hi,
>
> I'm sorry if I'm asking a basic question here.
>
> How the dbpedia extractor decide the class 
> (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>) of certain articles?
>
> For example "http://en.wikipedia.org/wiki/Honda_Integra"; article is 
> "MeanOfTransportation", "FrontWheelDriveVehicles", "Sedans", 
> "Automobile" etc.
>
> Did the extractor read from "Infobox" properties?(e.g. body_style) only?
>
> What if the article did not consist of 'infobox'? Only a pure text of 
> explanation of article. Will it read certain keyword inside the 
> 'abstract' and 'guess' the class of article? Or, are there more 
> complex mechanism happened at the engine before it decide the 
> type/class of article?
>
    The dbpedia ontology,  in my understanding,  is based on the 
infoboxes.  It's not as comprehensive as one might like,  but it's 
highly precise (in the sense of few false positives) and it's very 
usable for many applications.

    If something has no infobox,  it's not in the dbpedia ontology.  If 
something has two contradictory infoboxes it can have contradictory 
types:  for instance,  you might find a handful of cases of an item that 
is both a Person and a Place.

    There are a lot of other methods of ways to assign types.  For 
instance,  the Person type in Freebase is equivalent to the Person type 
in the dbpedia ontology,  and Freebase has much better recall for 
Persons because they've got heuristics that find people that don't have 
infoboxes.

    There's also a strategy of assigning classes based on other sorts of 
evidence:  for instance,  if you wanted to find the class of "Things 
That Are In or Associated With Paris,  France" you could

(i) include anything that has geographic coordinates inside the shape of 
Paris
(ii) include anything that has an address in Paris
(iii) find certain categories that are associated w/ Paris,  such as

http://en.wikipedia.org/wiki/Category:Visitor_attractions_in_Paris

    now you'll have some problems with precision when you do this,  bad 
stuff will have a way of sneaking in,  but there are ways to control 
this that are pretty efficient,  at least if you're working in a 
particular problem domain.

    My impression is that the YAGO people did something like this 
covering all of dbedia and that they didn't do a very good job.

------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to