Moin Max!

Max Jakob schrieb am Mittwoch, den 07. Juli 2010:

> Hi Tyler,
> 
> the extracted URIs are based on Infobox mappings and the data of the
> Wikipedia. The URIs you listed are identical to the ones in Wikipedia.
> Our framework is quite tolerant regarding valid URIs as it is supposed
> to represent the original Wikipedia data.
> We could easily use a stricter URI validation, but don't want to throw
> away data which might be useful to others.

That's what I presumed, it's really curious, some errors are slightly invalid
URIs (http//www.foo.com) or entirely incorrect (None)


I've slowly but surely have been fixing the errors as I come across them in the
file, but it's slow going (I can't figure out how to get dbpedia to ignore
erroneous entries). :(



> Cheers,
> Max
> 
> On Sat, Jun 19, 2010 at 12:24 AM, R. Tyler Ballance <[email protected]> 
> wrote:
> > I'm working with 3.5.1, and I've noticed that mappingbased_proopeties_en.nt,
> > compared to the other sets that I've worked with is *full* of errors that
> > break my imports in funky ways.
> >
> > There are a number of non-absolute URLs:
> >
> >    ERROR: Malformed document: Not a valid (absolute) URI: 
> > www.newfreedomboro.org/index2.htm [line 600646]
> >    ERROR: Malformed document: Not a valid (absolute) URI: 
> > www.rubenblades.com [line 975491]
> >    ERROR: Malformed document: Not a valid (absolute) URI: Fansite [line 
> > 1056096]
> >    ERROR: Malformed document: Not a valid (absolute) URI: None [line 278162]
> >
> > (Just as a couple examples)
> >
> > As a matter of practice, I've been just dropping malformed entites from the
> > file but I'm wondering if there's anything I can do to track down the 
> > errors to
> > help improve the next release?
> >
> > Would filing a ticket with a unified diff of the 3.5.1
> > mappingbased_proopeties_en.nt file compared to my modified one be helpful?
> >
> >
> > Cheers,
> > -R. Tyler Ballance
> > --------------------------------------
> >  Jabber: [email protected]
> >  GitHub: http://github.com/rtyler
> > Identica: http://identi.ca/dero
> >  Twitter: http://twitter.com/agentdero
> >    Blog: http://unethicalblogger.com
> >
> >
> > ------------------------------------------------------------------------------
> > ThinkGeek and WIRED's GeekDad team up for the Ultimate
> > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> > lucky parental unit.  See the prize list and enter to win:
> > http://p.sf.net/sfu/thinkgeek-promo
> > _______________________________________________
> > Dbpedia-discussion mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> >
> >
Cheers,
-R. Tyler Ballance
--------------------------------------
  Jabber: [email protected]
  GitHub: http://github.com/rtyler
Identica: http://identi.ca/dero
 Twitter: http://twitter.com/agentdero
    Blog: http://unethicalblogger.com

Attachment: pgpvxgd5FF7M7.pgp
Description: PGP signature

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to