On 12/28/2011 12:28 PM, Patrick Cassidy wrote:
>
> The way an ontology *should be* is the way it will be most useful to those
> who intend to use it.
One trouble is that 'useful' depends on the use. Knowledge base A
might be able to attain hyperprecision easily for task B, but only give
the 75% accuracy that people settle for today for task C. From the
viewpoint of task B, KB A is great. A "reconciled" database that gives
90% accuracy for both might be good enough to write a paper about but
won't be commercially viable for either task B or C.
> ... Cyc ...
>
Cyc inevitably gets mentioned out when the commonsense domain
comes up. Even if one does accept the conventional wisdom that Cyc was
a failure, the failure of Cyc doesn't mean that the commonsense domain
is intractable any more than the thousands of failed attempts at flight
before the Wright Brothers meant that airplanes were impossible.
The adoption cost is a killer, however, and is one that will
probably need to be radically reduced or eliminated if a Cyc-like
product is to become mainstream.
My take is that Freebase and DBpedia are an extensional approach
to the commonsense domain (defining "person" with a long list of people
and their attributes) rather than the intensional approach taken by Cyc
and SUMO (does a person have two arms and two legs? is a person a
member of the same species as Carl Linnaeus? is a corporation a
person? In what sense is Frodo Baggins a person?)
I think the extensional and intensional approaches will both be
useful, but I think that computers will need a logical framework that
covers everything about as much as people do. People generally don't
need to think about cooking and quantum mechanics at the same time, and
if somebody does, they'll invent their own framework. The only way a
framework is going to be useful is if it is actually used, and a
framework that "supports" oodles of hypothetical use cases that don't
actually get used won't be usable for any of them.
> What is actually needed is not wide agreement on a massive terminology of
> hundreds of thousands of
> terms, but only on a basic **defining vocabulary** of a few thousand terms
> that is sufficient to describe accurately any specialized concept one would
> want to define.
If that's the case, why don't you use SUMO, which is trying
to accomplish exactly that?
My answer would be that the vocabulary of a few thousand terms
leaves you alone with the grounding problem. With a very large
terminology (say the set of 3M dbpedia resources) you can, on the other
hand, apply methods that work statistically, and even if you can't
find the "correct" chain of inference you can find a large number of
chains that support correct conclusions.
The most direct criticism that can be made of Chomsky's
linguistic program is that we've never been able to use it to transplant
the "language instinct" into a computer. Yet. the "language instinct"
is a facility that is part of an animal, and perhaps it needs to have
an animal attached in order to work. Perhaps not necessarily a flesh
and blood animal, but some kind of simulation of one.
Mammals are quite good at commonsense reasoning, and if you
know them well you'll discover that they're good at many of the things
where Cyc tries to extend conventional logic-based systems. To make
progress on "language" and "vocabularies" and such, I think it's
necessary to step back and look at the primary process in which humans
and animals do probabilistic commonsense inference about themselves,
their environments and each other.
> If, on the other hand, it is expected that only probabilistic information
> will be extracted from queries on the DBpedia database, suitable only for
> inspection by potential human users, then such care in formalization may not
> be required. But it would still be helpful, and wouldn't add a lot of work
> to what is being done. The main effort is in carefully specifying the
> meanings of the relations being used, to avoid ambiguity and duplication.
>
Well, one trouble is that Dbpedia is not a database of people,
places and things. It's a database of Wikipedia pages.
I could let this bother me because I'm interested in the kind of
ontology that the Greeks were interested in. What sort of things
exist? Wikipedia's sense of what a "thing" is is the kind of thing
that would drive any sensitive person nuts unless they decided to accept
it as it is.
For example, DBpedia has no concept that corresponds to
"special/exceptional". It doesn't distinguish between "Gingerbread" and
"Gingerbread House" but recognizes five or so senses of "Gingerbread
man". It's rather hard to prove that a person or book is notable in
Wikipedia but it seems impossible for a video game to not be notable.
Every episode of "Star Trek" has its own Wikipedia page, but you'll
find no episodes of "General Hospital". Sometimes it is hard to
determine what exact "thing" some Wikipedia pages are about.
Wikipedia's p.o.v. is not the consistent p.o.v. of an ontological
engineer, but is the result of a battle between inclusionists and
deletionists. It's approximately consistent because people fix obvious
inconsistencies.
DBpedia attains hyperprecision by being focused -- it documents
Wikipedia, not the world. I wouldn't say that it's smarter than Cyc,
but it's certainly more human. Once you take on more responsibility
that DBpedia takes on, you'll find yourself introducing errors of a
different kind.
By 2025, yes, I think we'll have an "encyclopedia of the
situation", more or less the content of Wikipedia in some logical form
that can be queried. What I see in 2012, however, is that it's
possible to derive heuristics that can very accurately make specific
distinctions between "things"... It's possible to clone human faculties
without cloning the human. It might take a bundle of 500 faculties to
build an IX system that could build DBpedia 2025.
------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual
desktops for less than the cost of PCs and save 60% on VDI infrastructure
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion