[ 
https://issues.apache.org/jira/browse/ANY23-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannes Mühleisen updated ANY23-49:
----------------------------------

    Comment: was deleted

(was: Adaptation of NQuadsParser)
    
> N3/NQ parsers ignoring stopAtFirstError flag
> --------------------------------------------
>
>                 Key: ANY23-49
>                 URL: https://issues.apache.org/jira/browse/ANY23-49
>             Project: Apache Any23
>          Issue Type: Bug
>         Environment: Any23 0.6.1 and repository
>            Reporter: Hannes Mühleisen
>         Attachments: RobustNquadsParser.java
>
>
> The base interface for all RDF parsers (org.openrdf.rio.RDFParser) defines a 
> method setStopAtFirstError. The documentation for this methods reads as "Sets 
> whether the parser should stop immediately if it finds an error in the data". 
> This is indeed very useful, as many data sets "out there" contain an amount 
> of malformed entries.
> However, as far as I can tell from the current source code (0.6.1 and SVN 
> trunk), the NQuadsParser (org.deri.any23.parser.NQuadsParser) ignores this 
> flag. In its original implementation, it runs through the entire input in an 
> unchecked loop as such:
> while(parseLine(fileReader)) {
>     nextRow();
> }
> Now, if the parsing of any line in a potential huge file throws an exception, 
> the entire parsing process stops regardless of the setting of the 
> "stopAtFirstError" flag. I propose these loops to be changed to honor this 
> flag, so that when it is set to "false", the rest of the line is discarded 
> and the parsing process can continue with the next line.
> I have implemented this behavior on the latest version of NQuadsParser from 
> SVN (r1601), the source file is attached. I have changed the parseLine() 
> method as follows:
> private boolean parseLine(BufferedReader br) throws IOException,
>                       RDFParseException, RDFHandlerException {
>     // [...]
>     try {
>         // [...]
>         // notifiyStatement moved into try block
>         notifyStatement(sub, pred, obj, graph);
>     } catch (EOS eos) {
>         reportFatalError("Unexpected end of line.", row, col);
>         throw new IllegalStateException();
>     } catch (IllegalArgumentException iae) {
>         if (!stopAtFirstError()) {
>             // remove remainder of broken line
>             consumeBrokenLine(br);
>             // notify parse error listener
>             reportError(iae.getMessage(), row, col);
>         } else {
>             throw new RDFParseException(iae);
>         }
>     }
>     // [...]
> }
> private void consumeBrokenLine(BufferedReader br) throws IOException {
>     char c;
>     while (true) {
>         mark(br);
>         c = readChar(br);
>         if (c == '\n') {
>             return;
>         }
>     }
> }
> It would be great if this or similar changes would find their way into the 
> various Any23 RDF parsers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to