Davide Palmisano wrote:
>   
>   
> Yes, that's my fear. In fact, some months ago I was doing a massive 
> ingestion from dbpedia and my business logic wasn't provide this kind of 
> custom logic. The result was that a lot of dbpedia resources was not 
> ingested.
>   
    That's why,  when I do generic database work (either dbpedia or 
freebase) I like to download the whole thing.  If you're doing a little 
bit at a time,  integrity problems have a way of going unnoticed.  For 
instance,  when I loaded all of dbpedia into a non-RDF system that was 
case insensitive for titles,  it was clear that wikipedia has 10,000 or 
so entry pairs like this:

http://en.wikipedia.org/wiki/Direct_instruction
http://en.wikipedia.org/wiki/Direct_Instruction

    here you've got two similar but different items that differ in title 
only by case.  In general you want a human-friendly search engine or 
search completion facility to be case insensitive,  "alan alda" should 
turn up

http://en.wikipedia.org/wiki/Alan_Alda

    but you also need something that can correctly handle labels/urls 
that vary only by case without breaking.  Had I used a standard RDF 
stack for my work with dbpedia,  I would have blasted right by many 
integrity issues that the system I built forced me to confront.


------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to