Just a note on a couple of points raised by Paul Houle: [PC]>> What is actually needed is not wide agreement on a massive terminology of hundreds of thousands of >> terms, but only on a basic **defining vocabulary** of a few thousand terms >> that is sufficient to describe accurately any specialized concept one would >> want to define.
[PH] If that's the case, why don't you use SUMO, which is trying to accomplish exactly that? No, that was not the intended purpose of SUMO, a project in which I participated. SUMO could be used as a starter ontology to create the required inventory of semantic primitives, but would need supplementation. Closer to the required ontology is the COSMO ontology (see http://www.micra.com/COSMO/), which I have put together using elements of OpenCyc, SUMO, BFO, and DOLCE, along with other elements not contained in any of those. The COSMO has ontological representations of all of the words in the Longman dictionary defining vocabulary, which are used to create the dictionary definitions of all of the words in the Longman dictionary. Analogously, for any given set of specialized ontologies used in different applications, there will be some set of primitive elements sufficient to create the logical specifications of all of the elements in the domain ontologies. The COSMO is intended to contain all of those primitive elements, and to be supplemented as needed if new primitives are discovered. But I am assuming that the ontology for DBpedia will necessarily develop from the interests of the contributors. It takes effort to find relationships among different domains and to reconcile different viewpoints, but that task can be made more efficient, and the product more accurate by identifying the semantic primitives used in common. It appears, from what I have read about DBpedia thus far, that some effort has already been made to reconcile different terminologies in porting the infobox data into the common data store. That is a very good start, but examining the ontology shows that a lot more work is needed. One thing I am trying to learn is, exactly how is the DBpedia being used. Those uses will determine how the ontology might be modified to be more effective. [PH] >> The only way a >> framework is going to be useful is if it is actually used, and a >> framework that "supports" oodles of hypothetical use cases that don't >> actually get used won't be usable for any of them. Yes, and that is why I want to learn how the DBpedia ontology is being used. [PH] >> Wikipedia's p.o.v. is not the consistent p.o.v. of an ontological >> engineer, but is the result of a battle between inclusionists and >> deletionists. It's approximately consistent because people fix obvious >> inconsistencies. Yes, the Wikipedia itself may have inconsistencies, but it is intended for use by people who may be able to interpret the linguistic phrases in their proper context. The ontology, however, does not have to have that problem. Wherever there are inconsistent theories, they can be represented in the ontology as different **theories**, which do not have to be logically consistent (the CYC "microtheories" are one example). The base vocabulary of logical primitives, however, will be consistent, and the same vocabulary of primitives will be able to **logically describe** the different theories so that the differences will be precisely recognized. To give a trivial example, one can state proposition [A] and proposition [not A]. If "A" is definable by the inventory of semantic primitives, then both of these inconsistent theories can be represented by the same consistent ontology. The point here is that the DBpedia ontology **can** have a logically consistent representation of all of the information in the Wikipedia infoboxes, and that information can be used for precise automated reasoning. It is not as difficult as it may appear to accomplish this, but it does require that one make the effort. If, however, none of the DBpedia users wants to use the DBpedia for precise reasoning, then the extra effort may be superfluous. That is why I would like to hear from people who are using the DBpedia ontology. [PH] >> My answer would be that the vocabulary of a few thousand terms >> leaves you alone with the grounding problem. With a very large >> terminology (say the set of 3M dbpedia resources) you can, on the other >> hand, apply methods that work statistically, and even if you can't >> find the "correct" chain of inference you can find a large number of >> chains that support correct conclusions. You have a "grounding problem" regardless of the size of the vocabulary. Finding the defining primitives merely allows one to identify the minimum set of concepts that need physical grounding. Statistical methods do allow, in theory, arbitrarily precise results, provided that one has a near-infinite set of correlated examples for precisely those cases one wants to correlate. I don't believe that the structure of the Wikipedia or of DBpedia fit those criteria, but would be quite fascinated if any user has an example of such statistical usage. Pat Patrick Cassidy MICRA Inc. [email protected] 908-561-3416 ================= ===================== -----Original Message----- From: Paul Houle [mailto:[email protected]] Sent: Wednesday, December 28, 2011 8:20 PM To: Patrick Cassidy Cc: 'dbpedia-discussion' Subject: Re: [Dbpedia-discussion] DBpedia ontology On 12/28/2011 12:28 PM, Patrick Cassidy wrote: > > The way an ontology *should be* is the way it will be most useful to those > who intend to use it. One trouble is that 'useful' depends on the use. Knowledge base A might be able to attain hyperprecision easily for task B, but only give the 75% accuracy that people settle for today for task C. From the viewpoint of task B, KB A is great. A "reconciled" database that gives 90% accuracy for both might be good enough to write a paper about but won't be commercially viable for either task B or C. > ... Cyc ... > Cyc inevitably gets mentioned out when the commonsense domain comes up. Even if one does accept the conventional wisdom that Cyc was a failure, the failure of Cyc doesn't mean that the commonsense domain is intractable any more than the thousands of failed attempts at flight before the Wright Brothers meant that airplanes were impossible. The adoption cost is a killer, however, and is one that will probably need to be radically reduced or eliminated if a Cyc-like product is to become mainstream. My take is that Freebase and DBpedia are an extensional approach to the commonsense domain (defining "person" with a long list of people and their attributes) rather than the intensional approach taken by Cyc and SUMO (does a person have two arms and two legs? is a person a member of the same species as Carl Linnaeus? is a corporation a person? In what sense is Frodo Baggins a person?) I think the extensional and intensional approaches will both be useful, but I think that computers will need a logical framework that covers everything about as much as people do. People generally don't need to think about cooking and quantum mechanics at the same time, and if somebody does, they'll invent their own framework. The only way a framework is going to be useful is if it is actually used, and a framework that "supports" oodles of hypothetical use cases that don't actually get used won't be usable for any of them. > What is actually needed is not wide agreement on a massive terminology of hundreds of thousands of > terms, but only on a basic **defining vocabulary** of a few thousand terms > that is sufficient to describe accurately any specialized concept one would > want to define. If that's the case, why don't you use SUMO, which is trying to accomplish exactly that? My answer would be that the vocabulary of a few thousand terms leaves you alone with the grounding problem. With a very large terminology (say the set of 3M dbpedia resources) you can, on the other hand, apply methods that work statistically, and even if you can't find the "correct" chain of inference you can find a large number of chains that support correct conclusions. The most direct criticism that can be made of Chomsky's linguistic program is that we've never been able to use it to transplant the "language instinct" into a computer. Yet. the "language instinct" is a facility that is part of an animal, and perhaps it needs to have an animal attached in order to work. Perhaps not necessarily a flesh and blood animal, but some kind of simulation of one. Mammals are quite good at commonsense reasoning, and if you know them well you'll discover that they're good at many of the things where Cyc tries to extend conventional logic-based systems. To make progress on "language" and "vocabularies" and such, I think it's necessary to step back and look at the primary process in which humans and animals do probabilistic commonsense inference about themselves, their environments and each other. > If, on the other hand, it is expected that only probabilistic information > will be extracted from queries on the DBpedia database, suitable only for > inspection by potential human users, then such care in formalization may not > be required. But it would still be helpful, and wouldn't add a lot of work > to what is being done. The main effort is in carefully specifying the > meanings of the relations being used, to avoid ambiguity and duplication. > Well, one trouble is that Dbpedia is not a database of people, places and things. It's a database of Wikipedia pages. I could let this bother me because I'm interested in the kind of ontology that the Greeks were interested in. What sort of things exist? Wikipedia's sense of what a "thing" is is the kind of thing that would drive any sensitive person nuts unless they decided to accept it as it is. For example, DBpedia has no concept that corresponds to "special/exceptional". It doesn't distinguish between "Gingerbread" and "Gingerbread House" but recognizes five or so senses of "Gingerbread man". It's rather hard to prove that a person or book is notable in Wikipedia but it seems impossible for a video game to not be notable. Every episode of "Star Trek" has its own Wikipedia page, but you'll find no episodes of "General Hospital". Sometimes it is hard to determine what exact "thing" some Wikipedia pages are about. Wikipedia's p.o.v. is not the consistent p.o.v. of an ontological engineer, but is the result of a battle between inclusionists and deletionists. It's approximately consistent because people fix obvious inconsistencies. DBpedia attains hyperprecision by being focused -- it documents Wikipedia, not the world. I wouldn't say that it's smarter than Cyc, but it's certainly more human. Once you take on more responsibility that DBpedia takes on, you'll find yourself introducing errors of a different kind. By 2025, yes, I think we'll have an "encyclopedia of the situation", more or less the content of Wikipedia in some logical form that can be queried. What I see in 2012, however, is that it's possible to derive heuristics that can very accurately make specific distinctions between "things"... It's possible to clone human faculties without cloning the human. It might take a bundle of 500 faculties to build an IX system that could build DBpedia 2025. ---------------------------------------------------------------------------- -- Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastructure or vast IT resources to deliver seamless, secure access to virtual desktops. With this all-in-one solution, easily deploy virtual desktops for less than the cost of PCs and save 60% on VDI infrastructure costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion ------------------------------------------------------------------------------ Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastructure or vast IT resources to deliver seamless, secure access to virtual desktops. With this all-in-one solution, easily deploy virtual desktops for less than the cost of PCs and save 60% on VDI infrastructure costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
