sarif ishak wrote:
> Hi,
>
> I'm sorry if I'm asking a basic question here.
>
> How the dbpedia extractor decide the class
> (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>) of certain articles?
>
> For example "http://en.wikipedia.org/wiki/Honda_Integra" article is
> "MeanOfTransportation", "FrontWheelDriveVehicles", "Sedans",
> "Automobile" etc.
>
> Did the extractor read from "Infobox" properties?(e.g. body_style) only?
>
> What if the article did not consist of 'infobox'? Only a pure text of
> explanation of article. Will it read certain keyword inside the
> 'abstract' and 'guess' the class of article? Or, are there more
> complex mechanism happened at the engine before it decide the
> type/class of article?
>
The dbpedia ontology, in my understanding, is based on the
infoboxes. It's not as comprehensive as one might like, but it's
highly precise (in the sense of few false positives) and it's very
usable for many applications.
If something has no infobox, it's not in the dbpedia ontology. If
something has two contradictory infoboxes it can have contradictory
types: for instance, you might find a handful of cases of an item that
is both a Person and a Place.
There are a lot of other methods of ways to assign types. For
instance, the Person type in Freebase is equivalent to the Person type
in the dbpedia ontology, and Freebase has much better recall for
Persons because they've got heuristics that find people that don't have
infoboxes.
There's also a strategy of assigning classes based on other sorts of
evidence: for instance, if you wanted to find the class of "Things
That Are In or Associated With Paris, France" you could
(i) include anything that has geographic coordinates inside the shape of
Paris
(ii) include anything that has an address in Paris
(iii) find certain categories that are associated w/ Paris, such as
http://en.wikipedia.org/wiki/Category:Visitor_attractions_in_Paris
now you'll have some problems with precision when you do this, bad
stuff will have a way of sneaking in, but there are ways to control
this that are pretty efficient, at least if you're working in a
particular problem domain.
My impression is that the YAGO people did something like this
covering all of dbedia and that they didn't do a very good job.
------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion