Re: [Dbpedia-discussion] DBpedia ontology

Patrick Cassidy Thu, 29 Dec 2011 05:42:38 -0800

Just a note on a couple of points raised by Paul Houle:

[PC]>> What is actually needed is not wide agreement on a massive
terminology of hundreds of thousands of
>> terms, but only on a basic **defining vocabulary** of a few thousand
terms
>> that is sufficient to describe accurately any specialized concept one
would
>> want to define.


[PH] If that's the case,  why don't you use SUMO,  which is trying 
to accomplish exactly that?

No, that was not the intended purpose of SUMO, a project in which I
participated.  SUMO could be used as a starter ontology to create the
required inventory of semantic primitives, but would need supplementation.
Closer to the required ontology is the COSMO ontology (see
http://www.micra.com/COSMO/), which I have put together using elements of
OpenCyc, SUMO, BFO, and DOLCE, along with other elements not contained in
any of those.  The COSMO has ontological representations of all of the words
in the Longman dictionary defining vocabulary, which are used to create the
dictionary definitions of all of the words in the Longman dictionary.
Analogously, for any given set of specialized ontologies used in different
applications, there will be some set of primitive elements sufficient to
create the logical specifications of all of the elements in the domain
ontologies.  The COSMO is intended to contain all of those primitive
elements, and to be supplemented as needed if new primitives are discovered.


But I am assuming that the ontology for DBpedia will necessarily develop
from the interests of the contributors.  It takes effort to find
relationships among different domains and to reconcile different viewpoints,
but that task can be made more efficient, and the product more accurate by
identifying the semantic primitives used in common.

 It appears, from what I have read about DBpedia thus far, that some effort
has already been made to reconcile different terminologies in porting the
infobox data into the common data store.  That is a very good start, but
examining the ontology shows that a lot more work is needed.

One thing I am trying to learn is, exactly how is the DBpedia being used.
Those uses will determine how the ontology might be modified to be more
effective.

[PH] >> The only way a 
>> framework is going to be useful is if it is actually used,  and a 
>> framework that "supports" oodles of hypothetical use cases that don't 
>> actually get used won't be usable for any of them.
  Yes, and that is why I want to learn how the DBpedia ontology is being
used.

[PH]    >>  Wikipedia's p.o.v. is not the consistent p.o.v. of an
ontological 
>> engineer,  but is the result of a battle between inclusionists and 
>> deletionists.  It's approximately consistent because people fix obvious 
>> inconsistencies.

   Yes, the Wikipedia itself may have inconsistencies, but it is intended
for use by people who may be able to interpret the linguistic phrases in
their proper context.  The ontology, however, does not have to have that
problem.  Wherever there are inconsistent theories, they can be represented
in the ontology as different **theories**, which do not have to be logically
consistent (the CYC "microtheories" are one example).  The base vocabulary
of logical primitives, however, will be consistent, and the same vocabulary
of primitives will be able to **logically describe** the different theories
so that the differences will be precisely recognized.  
   To give a trivial example, one can state proposition [A] and proposition
[not A].  If "A" is definable by the inventory of semantic primitives, then
both of these inconsistent theories can be represented by the same
consistent ontology.

    The point here is that the DBpedia ontology **can** have a logically
consistent representation of all of the information in the Wikipedia
infoboxes, and that information can be used for precise automated reasoning.
It is not as difficult as it may appear to accomplish this, but it does
require that one make the effort.

   If, however, none of the DBpedia users wants to use the DBpedia for
precise reasoning, then the extra effort may be superfluous.  That is why I
would like to hear from people who are using the DBpedia ontology.

   

[PH]  >>      My answer would be that the vocabulary of a few thousand terms

>> leaves you alone with the grounding problem.  With a very large 
>> terminology (say the set of 3M dbpedia resources) you can,  on the other 
>> hand,  apply methods that work statistically,  and even if you can't 
>> find the "correct" chain of inference you can find a large number of 
>> chains that support correct conclusions.
  You have a "grounding problem" regardless of the size of the vocabulary.
Finding the defining primitives merely allows one to identify the minimum
set of concepts that need physical grounding.  Statistical methods do allow,
in theory, arbitrarily precise results, provided that one has a
near-infinite set of correlated examples for precisely those cases one wants
to correlate.  I don't believe that the structure of the Wikipedia or of
DBpedia fit those criteria, but would be quite fascinated if any user has an
example of such statistical usage.

   Pat


Patrick Cassidy
MICRA Inc.
[email protected]
908-561-3416
================= =====================

-----Original Message-----
From: Paul Houle [mailto:[email protected]] 
Sent: Wednesday, December 28, 2011 8:20 PM
To: Patrick Cassidy
Cc: 'dbpedia-discussion'
Subject: Re: [Dbpedia-discussion] DBpedia ontology

  On 12/28/2011 12:28 PM, Patrick Cassidy wrote:
>
>    The way an ontology *should be* is the way it will be most useful to
those
> who intend to use it.
    One trouble is that 'useful' depends on the use.  Knowledge base A 
might be able to attain hyperprecision easily for task B,  but only give 
the 75% accuracy that people settle for today for task C.  From the 
viewpoint of task B,  KB A is great.  A "reconciled" database that gives 
90% accuracy for both might be good enough to write a paper about but 
won't be commercially viable for either task B or C.
> ... Cyc ...
>
      Cyc inevitably gets mentioned out when the commonsense domain 
comes up.  Even if one does accept the conventional wisdom that Cyc was 
a failure,  the failure of Cyc doesn't mean that the commonsense domain 
is intractable any more than the thousands of failed attempts at flight 
before the Wright Brothers meant that airplanes were impossible.

      The adoption cost is a killer,  however,  and is one that will 
probably need to be radically reduced or eliminated if a Cyc-like 
product is to become mainstream.

      My take is that Freebase and DBpedia are an extensional approach 
to the commonsense domain (defining "person" with a long list of people 
and their attributes) rather than the intensional approach taken by Cyc 
and SUMO (does a person have two arms and two legs?  is a person a 
member of the same species as Carl Linnaeus?  is a corporation a 
person?  In what sense is Frodo Baggins a person?)

      I think the extensional and intensional approaches will both be 
useful,  but I think that computers will need a logical framework that 
covers everything about as much as people do.  People generally don't 
need to think about cooking and quantum mechanics at the same time,  and 
if somebody does,  they'll invent their own framework.  The only way a 
framework is going to be useful is if it is actually used,  and a 
framework that "supports" oodles of hypothetical use cases that don't 
actually get used won't be usable for any of them.
> What is actually needed is not wide agreement on a massive terminology of
hundreds of thousands of
> terms, but only on a basic **defining vocabulary** of a few thousand terms
> that is sufficient to describe accurately any specialized concept one
would
> want to define.
         If that's the case,  why don't you use SUMO,  which is trying 
to accomplish exactly that?

        My answer would be that the vocabulary of a few thousand terms 
leaves you alone with the grounding problem.  With a very large 
terminology (say the set of 3M dbpedia resources) you can,  on the other 
hand,  apply methods that work statistically,  and even if you can't 
find the "correct" chain of inference you can find a large number of 
chains that support correct conclusions.

        The most direct criticism that can be made of Chomsky's 
linguistic program is that we've never been able to use it to transplant 
the "language instinct" into a computer.  Yet.  the "language instinct" 
is a facility that is part of an animal,  and perhaps it needs to have 
an animal attached in order to work.  Perhaps not necessarily a flesh 
and blood animal,  but some kind of simulation of one.

        Mammals are quite good at commonsense reasoning,  and if you 
know them well you'll discover that they're good at many of the things 
where Cyc tries to extend conventional logic-based systems.  To make 
progress on "language" and "vocabularies" and such,  I think it's 
necessary to step back and look at the primary process in which humans 
and animals do probabilistic commonsense inference about themselves,  
their environments and each other.
> If, on the other hand, it is expected that only probabilistic information
> will be extracted from queries on the DBpedia database, suitable only for
> inspection by potential human users, then such care in formalization may
not
> be required.  But it would still be helpful, and wouldn't add a lot of
work
> to what is being done.  The main effort is in carefully specifying the
> meanings of the relations being used, to avoid ambiguity and duplication.
>
     Well,  one trouble is that Dbpedia is not a database of people,  
places and things.  It's a database of Wikipedia pages.

      I could let this bother me because I'm interested in the kind of 
ontology that the Greeks were interested in.  What sort of things 
exist?   Wikipedia's sense of what a "thing" is is the kind of thing 
that would drive any sensitive person nuts unless they decided to accept 
it as it is.

      For example,  DBpedia has no concept that corresponds to 
"special/exceptional".  It doesn't distinguish between "Gingerbread" and 
"Gingerbread House" but recognizes five or so senses of "Gingerbread 
man".  It's rather hard to prove that a person or book is notable in 
Wikipedia but it seems impossible for a video game to not be notable.  
Every episode of "Star Trek" has its own Wikipedia page,  but you'll 
find no episodes of "General Hospital".  Sometimes it is hard to 
determine what exact "thing" some Wikipedia pages are about.

      Wikipedia's p.o.v. is not the consistent p.o.v. of an ontological 
engineer,  but is the result of a battle between inclusionists and 
deletionists.  It's approximately consistent because people fix obvious 
inconsistencies.

       DBpedia attains hyperprecision by being focused -- it documents 
Wikipedia,  not the world.  I wouldn't say that it's smarter than Cyc,  
but it's certainly more human.  Once you take on more responsibility 
that DBpedia takes on,  you'll find yourself introducing errors of a 
different kind.

       By 2025,  yes,  I think we'll have an "encyclopedia of the 
situation",  more or less the content of Wikipedia in some logical form 
that can be queried.  What I see in 2012,  however,  is that it's 
possible to derive heuristics that can very accurately make specific 
distinctions between "things"...  It's possible to clone human faculties 
without cloning the human.  It might take a bundle of 500 faculties to 
build an IX system that could build DBpedia 2025.




----------------------------------------------------------------------------
--
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] DBpedia ontology

Reply via email to