Yes the tests are designed to be pragmatic If you are processing large amounts of data on Hadoop there are two cases:
- You want to skip/ignore bad data - You want to fail fast on bad data The failing tests are presumably the ones testing the second case. My general hacky approach to testing that is simply to generate some valid data followed by some junk data. If we change to the JSON-LD behaviour then those tests in Elephas that cover JSON-LD will need to change to generate a valid JSON object that happens to be invalid wrt. JSON-LD but since I don't know JSON-LD (and have zero desire to learn) I don't know what we'd need to generate to do that Rob On 04/10/2015 10:02, "Andy Seaborne" <[email protected]> wrote: >Claude, > >The point is more on the pragmatic side than the ideal design with a >tradeoff between maintaining our own code vs using a maintained library. > >The jsonld-java parsing process isn't streaming in either use case so >it's not a case of some triples read from the input. The jsonld-java >process is layered, not streamed - all the JSON parsing is done, then >the conversion to RDF happens. > >The two processes are: > >(Jena calling low level, non-API calls of jsonld-java): >1a/ Parse JSON >2a/ Do all triples >3a/ Check for trailing junk > >vs > >(jsonld-java API) >1b/ Parse JSON >2b/ Check for trailing junk >3b/ Do all triples > >I am wondering if the Elephas tests are tuned to the way Jena works in >these error cases, rather than relying on a feature of it. > > Andy > >AbstractWholeFileQuadInputFormatTests > >On 04/10/15 09:19, Claude Warren wrote: >> not Rob but my 2 cents..... >> >> I think that when we read turtle documents if there is an error the >>triples >> we have already read and left in the graph/model (yes, transactions can >> change this). Shouldn't all parsers follow the same pattern? >> >> Currently that pattern seems to be: read until eof or error and process >> what was read. >> >> Unless I am wrong about the above, I think that the JSON parser should >> return the json object that was parsed before the junk. >> >> >> Claude >> >> On Sat, Oct 3, 2015 at 7:21 PM, Andy Seaborne <[email protected]> wrote: >> >>> Upgrading the dependency for jsonld-java to 0.7.0 picks up a bug fix >>> (jsonld-java issue 144) that Jena has a workaround for. >>> >>> The issue is that the Jackson JSON parser does not flag trailing junk. >>>It >>> reads the JSON object and stops there. Worse, it creates a buffered >>>reader >>> so the caller can't handle the stream afterwards. >>> >>> --------------- >>> { >>> "@id" : "http://example/s", >>> "http://example/p" : "str" >>> } >>> xxxxxxxxxxxxxxx >>> --------------- >>> >>> Jena (JsonLdReader) contains code taken from jsonld-java and modified >>>to >>> run the Jackson JSON parser, produce triples and then check for >>>trailing >>> junk. The detect end of junk was contributed back to the project. PR >>>145. >>> >>> jsonld-java treats it more systematically. >>> >>> If the JSON is syntactically bad in the {}, no triples merge. The >>>process >>> is completely read the JSON object then let the RDF conversion run. >>>Bad >>> object -> no RDF at all. >>> >>> If there is trailing junk, it is detected before passing up the JSON >>> object so trailing junk, no triples unlike Jena currently. >>> >>> I had hoped to remove the workaround and not duplicate jsonld-java >>>code. >>> >>> Elephas testing is impacted. It is sensitive to the "JSON object, >>>trailing >>> junk, triples" vs "JSON object, triples, trailing junk" differences. >>> >>> Unless there is a specific reason to support that behaviour, I'd like >>>to >>> switch to jsonld-java behaviour. >>> >>> (Rob) Thoughts? >>> >>> Andy >>> >>> [1] https://github.com/jsonld-java/jsonld-java/issues/144 >>> >> >> >> >
