Hi Monika, (Hi Ian),
Ian has already answered your question.

However, I want to had a similar use case we have in relation to errors or 
malformed
RDF input files. When loading large RDF files we typically use N-Triples or 
N-Quads
and we want to continue parsing the file even if there are a few errors (i.e. 
invalid
lines).

We use RIOT and, even if there is not a feature to tell the parser to ignore an 
error,
skip the line and continue to parse, it's not expensive to construct a LangNQuad
object for each line of your input. So, this is what we often do:

    String line = ...
    Tokenizer tokenizer = 
TokenizerFactory.makeTokenizerString(value.toString());
    LangNQuads parser = new LangNQuads(tokenizer, profile, sink) ;
    parser.parse();

You can then catch all the exception and continue processing the next line.
This happens also when we write MapReduce jobs, for example here [1] or here 
[2]. (*)

Maybe, it's not that difficult to add a feature to RIOT's LangNQuad parser to 
report
errors but skip to the next line and continue parsing. However, I think this is 
close
to impossible for RDF/XML or Turtle serializations.

Paolo

 [1] 
https://github.com/castagna/tdbloader3/blob/master/src/main/java/com/talis/labs/tdb/tdbloader3/FirstMapper.java
 [2] 
https://github.com/castagna/tdbloader3/blob/master/src/main/java/com/talis/labs/tdb/tdbloader3/io/QuadRecordReader.java


(*)
By the way, if someone wants to help me removing the bottleneck caused by the 
fact
that I am using a single reducer in the first MapReduce job of tdbloader3 or has
ideas on how it could be done, let me know.

Monika Solanki wrote:
> Is it possible to check if the incoming data is legal RDF before reading
> into the model? I do not want my program to throw an error via
> RDFDefaultErrorHandler if the incoming data is illegal RDF. I only want
> a warning to be issued and the program should continue execution. If
> there are any  supporting examples, that would be very helpful.
> 
> Thanks,
> 
> Monika

Reply via email to