On 2/3/11 10:03 AM, Paul Houle wrote: > On 2/2/2011 4:10 PM, Lushan Han wrote: >> Hi, >> >> FYI, the class type dbpedia-owl:City is missing for capitals, for >> example, http://dbpedia.org/page/London. >> And the example on DBPedia website "Cities with more than 2 million >> habitants" therefore failed to give out capitals. >> >> Best regards, >> Lushan >> > Here we go again. > > (1) There's a fundamental ontological problem here. Technically > London (like Tokyo) is not a city. London is a metropolitan area that > is composed of 33 boroughs such as Westminister, Kensington, Hackney > and Camden. The actual "City of London" is the financial district and > is about a square mile in area. > > (2) Dbpedia has poor recall for many common types such as human > settlements and people. The underlying issue is that it extracts type > information from infoboxes, which are used inconsistently... There > isn't a "city infobox", but rather, there are different infoboxes that > are used in different regions and different areas. The signal is > imperfect (many people have no infoboxes at all) and the set of rules > that dbpedia uses to extract types is also imperfect. The flip side is > that the precision of types in dbpedia is absolutely excellent, and > I've found quite literally a handful of cases where things were mistyped > in a blatantly wrong way.
So why don't you make a linkset that addresses these issues? You can tweak the DBpedia TBox or make your own. I can load it into a Named Graph distinct from the main DBpedia graph. Then it can be evaluated en route to becoming part of the main Graph, if you choose. I performed a similar exercise [1] (which I hope becomes the norm) with @danbri a few days ago. This process is a nice stop-gap while Wikipedia evolves re. structured data. > ---- > > The answer to (1) in commonsense reasoning systems is to maintain > "vernacular types" that reflect popular understandings. > > It's still tricky; the classification of human settlements is > difficult because there's no clear line between "city", "town" and > "village"; people in other language areas, such as de, have concepts > that are similar but different, such as "stadt" and "dorf". A > vernacular type that would work in the en-zone is to say, "anything > that has town in it's name is a :Town" but a place that's called a > "Town" in the U.S. could be a small city, a village, a rural area > where 20-30% of people live in a few concentrated areas (the "Town" that > I write a tax check to every year), or a centerless suburban or > posturban area like Derry, N.H. > > In New York State there are approximately 20 types of local > government, and the law for the establishment of local governments is > different in all 50 states of the :United_States, and different in the > 200 or so other countries that are out there. One could imagine a very > detailed data model that represents this very precisely, but it would > be a difficult model to work with and you'd still need some kind of > vernacular layer to make it easier to work with. > > As for (2) the easy thing to do is get your types from Freebase. > Precision in Freebase is slightly worse than Dbpedia, but recall is > better by a factor of 2 or more for many types. Freebase has used both > machine learning and crowdsourcing techniques to produce a type system > that's easy to work with. Yes, so make a linkbase for now as I suggested. Links: 1. http://danbri.org/words/2011/02/01/658 Kingsley > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > Dbpedia-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > -- Regards, Kingsley Idehen President& CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen ------------------------------------------------------------------------------ The modern datacenter depends on network connectivity to access resources and provide services. The best practices for maximizing a physical server's connectivity to a physical network are well understood - see how these rules translate into the virtual world? http://p.sf.net/sfu/oracle-sfdevnlfb _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
