On 11/22/2011 6:06 AM, Yury Katkov wrote:
>
> 1) I'm thinking of the tutorial on how to use the couple 
> Dbpedia/Freebase in practical projects. Is there any other simular 
> tutorial out there, would it be useful in your opinion?
     What people need here is a product,  not a tutorial.

     The factforge query editor does a "reasonable" job of combining 
data from DBpedia and Freebase and making them queryable through SPARQL.

     Like most similar things,  it doesn't provide a consistent 
point-of-view.  If you look at the sample queries at the bottom you 
often see them using multiple predicates to get information that came in 
through different sources.  owl:sameAs gets used to glom together 
Freebase and DBpedia concepts;  I don't know where they got their 
owl:sameAs statements,  but I know that if you use the ones that come 
with Wikipedia you'll find some concepts end up getting lost or confused 
in big Katamari balls.

     If you want to write simple queries and get consistently good 
results you need some process where you clean the data up,  decide what 
you believe when there is contradictory information and all of that.  It 
would really be great if we had some system that could represent that 
"john thinks that mary said Lady Gaga is a man" but the strategy of 
throwing it all in a triple store and hope people aren't going to notice 
they're getting bad results doesn't cut it.

> 2) can we generate new mappings or improve the extraction scripts by 
> analysing and parsing Freebase data? Is that a good idea or is there 
> some kind of redundancy there? Is it legal with respect to Google's 
> licences?
>
     I think DBpedia has a different philosophy than Freebase.

    DBpedia combines a Wikipedia dump with a rulebox that creates a set 
of triples.  The rulebox isn't capable of doing deep cleanup on data.  
The "correct" way to fix something in DBpedia is to fix the data in 
Wikipedia.  If I've got an automated process that say,  reconciles 
places with Geonames and inserts geographic coordinates for 200,000 
places,  the only way I can get this data is get it into Wikipedia,  and 
that's a very difficult proposition because there are many different 
infobox templates for different kinds of locations in which coordinates 
are represented differently.

     On the other hand,  if you want to bulk insert data into Freebase,  
or fix a fact that's wrong,  it's pretty easy to do.

     DBpedia might look like a database about "topics",  but it's really 
a database about Wikipedia pages -- which contains things like

http://en.wikipedia.org/wiki/List_of_Star_Wars_characters

     Freebase filters things like this out,  because they're not really 
"things".  Yet,  these records could be very valuable for information 
extraction.  Wikipedia pagelinks from that page give evidence that 
there's a connection between :Boba_Fett and :Han_Solo,  for instance,  
and there are many List pages that are very minable even if DBpedia 
doesn't do it.

     A project that I've thought would be fun would be to produce 
replacement files for the NT dumps that DBpedia publishes based on 
Freebase data.  Most of the DBpedia ontology could be populated from 
Freebase data,  and in many cases,  Freebase would be more accurate.  It 
wouldn't be possible to get the categories or the pagelinks and 
information about List*'s would be lost,  but I think a lot it could be 
reconstructed.




------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to