Moin Max! Max Jakob schrieb am Mittwoch, den 07. Juli 2010:
> Hi Tyler, > > the extracted URIs are based on Infobox mappings and the data of the > Wikipedia. The URIs you listed are identical to the ones in Wikipedia. > Our framework is quite tolerant regarding valid URIs as it is supposed > to represent the original Wikipedia data. > We could easily use a stricter URI validation, but don't want to throw > away data which might be useful to others. That's what I presumed, it's really curious, some errors are slightly invalid URIs (http//www.foo.com) or entirely incorrect (None) I've slowly but surely have been fixing the errors as I come across them in the file, but it's slow going (I can't figure out how to get dbpedia to ignore erroneous entries). :( > Cheers, > Max > > On Sat, Jun 19, 2010 at 12:24 AM, R. Tyler Ballance <[email protected]> > wrote: > > I'm working with 3.5.1, and I've noticed that mappingbased_proopeties_en.nt, > > compared to the other sets that I've worked with is *full* of errors that > > break my imports in funky ways. > > > > There are a number of non-absolute URLs: > > > > ERROR: Malformed document: Not a valid (absolute) URI: > > www.newfreedomboro.org/index2.htm [line 600646] > > ERROR: Malformed document: Not a valid (absolute) URI: > > www.rubenblades.com [line 975491] > > ERROR: Malformed document: Not a valid (absolute) URI: Fansite [line > > 1056096] > > ERROR: Malformed document: Not a valid (absolute) URI: None [line 278162] > > > > (Just as a couple examples) > > > > As a matter of practice, I've been just dropping malformed entites from the > > file but I'm wondering if there's anything I can do to track down the > > errors to > > help improve the next release? > > > > Would filing a ticket with a unified diff of the 3.5.1 > > mappingbased_proopeties_en.nt file compared to my modified one be helpful? > > > > > > Cheers, > > -R. Tyler Ballance > > -------------------------------------- > > Jabber: [email protected] > > GitHub: http://github.com/rtyler > > Identica: http://identi.ca/dero > > Twitter: http://twitter.com/agentdero > > Blog: http://unethicalblogger.com > > > > > > ------------------------------------------------------------------------------ > > ThinkGeek and WIRED's GeekDad team up for the Ultimate > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > > lucky parental unit. See the prize list and enter to win: > > http://p.sf.net/sfu/thinkgeek-promo > > _______________________________________________ > > Dbpedia-discussion mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > > > > Cheers, -R. Tyler Ballance -------------------------------------- Jabber: [email protected] GitHub: http://github.com/rtyler Identica: http://identi.ca/dero Twitter: http://twitter.com/agentdero Blog: http://unethicalblogger.com
pgpvxgd5FF7M7.pgp
Description: PGP signature
------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
