[
https://issues.apache.org/jira/browse/ANY23-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated ANY23-49:
--------------------------------------
Affects Version/s: 0.7.0
Fix Version/s: 0.7.0
> N3/NQ parsers ignoring stopAtFirstError flag
> --------------------------------------------
>
> Key: ANY23-49
> URL: https://issues.apache.org/jira/browse/ANY23-49
> Project: Apache Any23
> Issue Type: Bug
> Affects Versions: 0.7.0
> Environment: Any23 0.6.1 and repository
> Reporter: Hannes Mühleisen
> Fix For: 0.7.0
>
> Attachments: RobustNquadsParser.java
>
>
> The base interface for all RDF parsers (org.openrdf.rio.RDFParser) defines a
> method setStopAtFirstError. The documentation for this methods reads as "Sets
> whether the parser should stop immediately if it finds an error in the data".
> This is indeed very useful, as many data sets "out there" contain an amount
> of malformed entries.
> However, as far as I can tell from the current source code (0.6.1 and SVN
> trunk), the NQuadsParser (org.deri.any23.parser.NQuadsParser) ignores this
> flag. In its original implementation, it runs through the entire input in an
> unchecked loop as such:
> while(parseLine(fileReader)) {
> nextRow();
> }
> Now, if the parsing of any line in a potential huge file throws an exception,
> the entire parsing process stops regardless of the setting of the
> "stopAtFirstError" flag. I propose these loops to be changed to honor this
> flag, so that when it is set to "false", the rest of the line is discarded
> and the parsing process can continue with the next line.
> I have implemented this behavior on the latest version of NQuadsParser from
> SVN (r1601), the source file is attached. I have changed the parseLine()
> method as follows:
> private boolean parseLine(BufferedReader br) throws IOException,
> RDFParseException, RDFHandlerException {
> // [...]
> try {
> // [...]
> // notifiyStatement moved into try block
> notifyStatement(sub, pred, obj, graph);
> } catch (EOS eos) {
> reportFatalError("Unexpected end of line.", row, col);
> throw new IllegalStateException();
> } catch (IllegalArgumentException iae) {
> if (!stopAtFirstError()) {
> // remove remainder of broken line
> consumeBrokenLine(br);
> // notify parse error listener
> reportError(iae.getMessage(), row, col);
> } else {
> throw new RDFParseException(iae);
> }
> }
> // [...]
> }
> private void consumeBrokenLine(BufferedReader br) throws IOException {
> char c;
> while (true) {
> mark(br);
> c = readChar(br);
> if (c == '\n') {
> return;
> }
> }
> }
> It would be great if this or similar changes would find their way into the
> various Any23 RDF parsers.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira