Hi,
i noticed that in the content of the long abstracts (checked en and de)
there are leftover html tags, probably from the extraction process.
these prevent xml processing of the content as they break the structure.
Also it seems that there are some invalid characters contained in the
content.
They are no problem in the other serialization formats, but disallowed in
xml.
Please see http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char.
I created 2 issues for these on github, #185 and #186.
Cheers
Robert
--
Robert David
Software Developer
Semantic Web Company GmbH
Mariahilfer Straße 70 / 8
A - 1070 Vienna, Austria
Tel +43 1 402 12 35
Fax +43 1 402 12 35 - 22
http://www.semantic-web.at
http://blog.semantic-web.at
http://poolparty.biz
LOD2 - Creating Knowledge out of Interlinked Data - http://lod2.eu/
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers