Re: [Dbpedia-discussion] DBpedia ontology

Paul Houle Wed, 28 Dec 2011 17:22:01 -0800

  On 12/28/2011 12:28 PM, Patrick Cassidy wrote:
>
>    The way an ontology *should be* is the way it will be most useful to those
> who intend to use it.
    One trouble is that 'useful' depends on the use.  Knowledge base A 
might be able to attain hyperprecision easily for task B,  but only give 
the 75% accuracy that people settle for today for task C.  From the 
viewpoint of task B,  KB A is great.  A "reconciled" database that gives 
90% accuracy for both might be good enough to write a paper about but 
won't be commercially viable for either task B or C.
> ... Cyc ...
>
      Cyc inevitably gets mentioned out when the commonsense domain 
comes up.  Even if one does accept the conventional wisdom that Cyc was 
a failure,  the failure of Cyc doesn't mean that the commonsense domain 
is intractable any more than the thousands of failed attempts at flight 
before the Wright Brothers meant that airplanes were impossible.


      The adoption cost is a killer,  however,  and is one that will 
probably need to be radically reduced or eliminated if a Cyc-like 
product is to become mainstream.

      My take is that Freebase and DBpedia are an extensional approach 
to the commonsense domain (defining "person" with a long list of people 
and their attributes) rather than the intensional approach taken by Cyc 
and SUMO (does a person have two arms and two legs?  is a person a 
member of the same species as Carl Linnaeus?  is a corporation a 
person?  In what sense is Frodo Baggins a person?)

      I think the extensional and intensional approaches will both be 
useful,  but I think that computers will need a logical framework that 
covers everything about as much as people do.  People generally don't 
need to think about cooking and quantum mechanics at the same time,  and 
if somebody does,  they'll invent their own framework.  The only way a 
framework is going to be useful is if it is actually used,  and a 
framework that "supports" oodles of hypothetical use cases that don't 
actually get used won't be usable for any of them.
> What is actually needed is not wide agreement on a massive terminology of 
> hundreds of thousands of
> terms, but only on a basic **defining vocabulary** of a few thousand terms
> that is sufficient to describe accurately any specialized concept one would
> want to define.
         If that's the case,  why don't you use SUMO,  which is trying 
to accomplish exactly that?

        My answer would be that the vocabulary of a few thousand terms 
leaves you alone with the grounding problem.  With a very large 
terminology (say the set of 3M dbpedia resources) you can,  on the other 
hand,  apply methods that work statistically,  and even if you can't 
find the "correct" chain of inference you can find a large number of 
chains that support correct conclusions.

        The most direct criticism that can be made of Chomsky's 
linguistic program is that we've never been able to use it to transplant 
the "language instinct" into a computer.  Yet.  the "language instinct" 
is a facility that is part of an animal,  and perhaps it needs to have 
an animal attached in order to work.  Perhaps not necessarily a flesh 
and blood animal,  but some kind of simulation of one.

        Mammals are quite good at commonsense reasoning,  and if you 
know them well you'll discover that they're good at many of the things 
where Cyc tries to extend conventional logic-based systems.  To make 
progress on "language" and "vocabularies" and such,  I think it's 
necessary to step back and look at the primary process in which humans 
and animals do probabilistic commonsense inference about themselves,  
their environments and each other.
> If, on the other hand, it is expected that only probabilistic information
> will be extracted from queries on the DBpedia database, suitable only for
> inspection by potential human users, then such care in formalization may not
> be required.  But it would still be helpful, and wouldn't add a lot of work
> to what is being done.  The main effort is in carefully specifying the
> meanings of the relations being used, to avoid ambiguity and duplication.
>
     Well,  one trouble is that Dbpedia is not a database of people,  
places and things.  It's a database of Wikipedia pages.

      I could let this bother me because I'm interested in the kind of 
ontology that the Greeks were interested in.  What sort of things 
exist?   Wikipedia's sense of what a "thing" is is the kind of thing 
that would drive any sensitive person nuts unless they decided to accept 
it as it is.

      For example,  DBpedia has no concept that corresponds to 
"special/exceptional".  It doesn't distinguish between "Gingerbread" and 
"Gingerbread House" but recognizes five or so senses of "Gingerbread 
man".  It's rather hard to prove that a person or book is notable in 
Wikipedia but it seems impossible for a video game to not be notable.  
Every episode of "Star Trek" has its own Wikipedia page,  but you'll 
find no episodes of "General Hospital".  Sometimes it is hard to 
determine what exact "thing" some Wikipedia pages are about.

      Wikipedia's p.o.v. is not the consistent p.o.v. of an ontological 
engineer,  but is the result of a battle between inclusionists and 
deletionists.  It's approximately consistent because people fix obvious 
inconsistencies.

       DBpedia attains hyperprecision by being focused -- it documents 
Wikipedia,  not the world.  I wouldn't say that it's smarter than Cyc,  
but it's certainly more human.  Once you take on more responsibility 
that DBpedia takes on,  you'll find yourself introducing errors of a 
different kind.

       By 2025,  yes,  I think we'll have an "encyclopedia of the 
situation",  more or less the content of Wikipedia in some logical form 
that can be queried.  What I see in 2012,  however,  is that it's 
possible to derive heuristics that can very accurately make specific 
distinctions between "things"...  It's possible to clone human faculties 
without cloning the human.  It might take a bundle of 500 faculties to 
build an IX system that could build DBpedia 2025.




------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] DBpedia ontology

Reply via email to