On 11/22/2011 6:06 AM, Yury Katkov wrote:
>
> 1) I'm thinking of the tutorial on how to use the couple
> Dbpedia/Freebase in practical projects. Is there any other simular
> tutorial out there, would it be useful in your opinion?
What people need here is a product, not a tutorial.
The factforge query editor does a "reasonable" job of combining
data from DBpedia and Freebase and making them queryable through SPARQL.
Like most similar things, it doesn't provide a consistent
point-of-view. If you look at the sample queries at the bottom you
often see them using multiple predicates to get information that came in
through different sources. owl:sameAs gets used to glom together
Freebase and DBpedia concepts; I don't know where they got their
owl:sameAs statements, but I know that if you use the ones that come
with Wikipedia you'll find some concepts end up getting lost or confused
in big Katamari balls.
If you want to write simple queries and get consistently good
results you need some process where you clean the data up, decide what
you believe when there is contradictory information and all of that. It
would really be great if we had some system that could represent that
"john thinks that mary said Lady Gaga is a man" but the strategy of
throwing it all in a triple store and hope people aren't going to notice
they're getting bad results doesn't cut it.
> 2) can we generate new mappings or improve the extraction scripts by
> analysing and parsing Freebase data? Is that a good idea or is there
> some kind of redundancy there? Is it legal with respect to Google's
> licences?
>
I think DBpedia has a different philosophy than Freebase.
DBpedia combines a Wikipedia dump with a rulebox that creates a set
of triples. The rulebox isn't capable of doing deep cleanup on data.
The "correct" way to fix something in DBpedia is to fix the data in
Wikipedia. If I've got an automated process that say, reconciles
places with Geonames and inserts geographic coordinates for 200,000
places, the only way I can get this data is get it into Wikipedia, and
that's a very difficult proposition because there are many different
infobox templates for different kinds of locations in which coordinates
are represented differently.
On the other hand, if you want to bulk insert data into Freebase,
or fix a fact that's wrong, it's pretty easy to do.
DBpedia might look like a database about "topics", but it's really
a database about Wikipedia pages -- which contains things like
http://en.wikipedia.org/wiki/List_of_Star_Wars_characters
Freebase filters things like this out, because they're not really
"things". Yet, these records could be very valuable for information
extraction. Wikipedia pagelinks from that page give evidence that
there's a connection between :Boba_Fett and :Han_Solo, for instance,
and there are many List pages that are very minable even if DBpedia
doesn't do it.
A project that I've thought would be fun would be to produce
replacement files for the NT dumps that DBpedia publishes based on
Freebase data. Most of the DBpedia ontology could be populated from
Freebase data, and in many cases, Freebase would be more accurate. It
wouldn't be possible to get the categories or the pagelinks and
information about List*'s would be lost, but I think a lot it could be
reconstructed.
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion