[Dbpedia-developers] Invalid content in dbpedia long abstracts

Robert David Mon, 03 Mar 2014 01:59:46 -0800

Hi,

i noticed that in the content of the long abstracts (checked en and de)
there are leftover html tags, probably from the extraction process.
these prevent xml processing of the content as they break the structure.


Also it seems that there are some invalid characters contained in the
content.
They are no problem in the other serialization formats, but disallowed in
xml.
Please see http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char.

I created 2 issues for these on github, #185 and #186.

Cheers
Robert


-- 

Robert David
Software Developer

Semantic Web Company GmbH
Mariahilfer Straße 70 / 8
A - 1070 Vienna, Austria
Tel +43 1 402 12 35
Fax +43 1 402 12 35 - 22

http://www.semantic-web.at
http://blog.semantic-web.at
http://poolparty.biz

LOD2 - Creating Knowledge out of Interlinked Data - http://lod2.eu/

------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk

_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

[Dbpedia-developers] Invalid content in dbpedia long abstracts

Reply via email to