On 9/27/10 4:42 PM, Paul Houle wrote:
>    I've recently put up a site that uses coordinate information from
> Freebase and Dbpedia,  and I'm starting to think about how to clean up
> certain data quality problems I'm encountering,  for instance,  see:
>
> http://ookaboo.com/o/pictures/topic/209440/Oakville_Assembly
>
> In this particular case,  I've only got data from dbpedia,  which drops
> the point a few hundred km from where it really is...  It's obvious that
> this is a bad one because it's right in the middle of Lake Erie.
> Freebase doesn't have any coordinate for this thing (seems to me that it
> should),  and at the moment,  Wikipedia has the right coordinates (at
> least on Google maps I see a big factory building)  My guess is that
> wikipedia might have been wrong at one time,  and has had it corrected.
> It's also possible that the conversion wasn't done right in dbpedia,
> since coordinates are represented differently in a few hundred different
> infoboxes.
>
> It seems to me that both the number of points and the quality of points
> in Wikipedia has been improving dramatically over the last two years...
> About a year ago I plotted the points for Staten Island Railroad
> stations and found that the railroad was displaced a few km east and ran
> right under the middle of the Tapan Zee bridge...  Now it's much better.
>
> I can find examples where:
>
> (a) dbpedia is right and freebase is wrong (for instance,  a town in
> continental Europe gets its longitude sign flipped and ends up with the
> wrecked ships west of the UK -- maybe here the point got fixed in
> wikipedia but not in freebase)
> (b) dbpedia is wrong and freebase is right
> (c) a point is missing from dbpedia but is in freebase (I see a lot of
> these in Switzerland),  and
> (d) a point is missing from freebase but in dbpedia
>
> An analysis of this is is tricky because there are a lot of things where
> the coordinates are iffy:  the location of 'Russia' could vary within a
> few thousand kilometers,  'Tompkins County' could vary by ten or so
> kilometers,  etc.
>
> Looking at a handful of points that have diverged,  I get the impression
> that freebase is more accurate than dbpedia,  but that I get better
> results just looking at the coordinates on the human interface of
> wikipedia -- currently,  it seems like a scan of the current coordinates
> in wikipedia (however wikipedia extracts them from the infoboxes)
> benefits the most from the human labor being done to fix points and also
> avoids errors&  missed points from other people's extraction pipelines.
>
>   From my viewpoint,  I'd like to make a map that doesn't have
> embarassing errors in it...  What's the best way to clean up this mess?

You have two data spaces: DBpedia and Freebase, you should make a third 
-- yours, which I think you have via ookaboo.

Place the fixed (cleansed) data  in your ookaboo data space, connect the 
coreferenced entities using an "owl:sameAs" relation, scope queries that 
are accuracy sensitive to your ookaboo data space.  Use inference rules 
for union expansion across DBpedia and Freebase via "owl:sameAs", when 
data quality requirements are low and data expanse requirements high.

That's how you clean up the mess and potentially get compensated for 
doing so, in the process :-)


Kingsley
> ------------------------------------------------------------------------------
> Start uncovering the many advantages of virtual appliances
> and start using them to simplify application deployment and
> accelerate your shift to cloud computing.
> http://p.sf.net/sfu/novell-sfdev2dev
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>


-- 

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen






------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to