Hi Folks, I was recently messing around the Web Data Commons (WDC) with the aim of setting up a Jena TDB of the entire most recent WDC dataset. When attempting to import a subset of the WDC nquads into TDB2 I was constantly banging into issues similar to the following
[2018-10-03 22:57:24] Fuseki ERROR [line: 102379, col: 115] Illegal character in IRI (codepoint 0x7C, '|'): < http://www.hsamuel.co.uk/l/gifts/category[|]...> [2018-10-03 22:57:26] Fuseki INFO [10] 400 Parse error: [line: 102379, col: 115] Illegal character in IRI (codepoint 0x7C, '|'): < http://www.hsamuel.co.uk/l/gifts/category[|]...> (6.786 s) The quad is as follows _:genid2d8089eee9237845cab8ffa262694474f12db3 <http://schema.org/item> < http://www.hsamuel.co.uk/l/gifts/category|ladies%20accessories/> < http://www.hsamuel.co.uk/webstore/d/2821133/happy+40th+anniversary+champagne+flutes/> . As you can see above, the presence of the '|' vertical bar is raising the error on the TDB2 side. When I ran this through the any23.org service with validate+fix, report an annotate parameters set to true, I got the following report <?xml version="1.0" encoding="UTF-8" ?> <response> <extractors> <extractor>rdf-nq</extractor> </extractors> <report> <message/> <error/> <issueReport> <extractorIssues extractor="rdf-nq"> <issue level="ERROR" row="1" col="-1">Unexpected character U+7C at index 41: http://www.hsamuel.co.uk/l/gifts/category|ladies%20accessories/</issue> </extractorIssues> </issueReport> <validationReport> <issues> </issues> <ruleActivations> </ruleActivations> <errors> </errors> </validationReport> </report> <data> <![CDATA[ [ { "@graph" : [ { "@id" : "http://www.hsamuel.co.uk/l/gifts/category|ladies%20accessories/", "http://schema.org/name" : [ { "@value" : "Ladies Accessories" } ] } ], "@id" : "http://www.hsamuel.co.uk/webstore/d/2821133/happy+40th+anniversary+champagne+flutes/" } ]]]> </data> </response> I thought the Subject should be fixed with the vertical replaced by the encoded vertical bar character but this doesn't seem to be the case. Lewis -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
