Timothy Higinbottom created JENA-2083:
-----------------------------------------

             Summary: Support skipping/ignoring errors with tdbloader
                 Key: JENA-2083
                 URL: https://issues.apache.org/jira/browse/JENA-2083
             Project: Apache Jena
          Issue Type: New Feature
          Components: TDB, TDB2
            Reporter: Timothy Higinbottom


Hi all,

I have a fairly large (~22,000) number of N-Triples files I hope to import into 
TDB2 to query with Fuseki.

I boosted the RAM allotted to the JVM and used the parallel mode from 
tdb2.tdbloader. This whizzed through the first 1,000 of the files.

However, some of the files are incorrectly serialized, so they caused errors 
when Jena tried to read them. It is not feasible right now to sort out the 
defective files from the good ones before running tdbloader.

It would be great if tdbloader could add an option to skip the files that error 
so that it can continue to process the other files.

The main reason this should be part of tdbloader itself is that the alternative 
(running xargs or a loop in Bash) decreases performance because then the 
loading is effectively synchronous and the user can't take advantage of the 
tdbloader modes and batching.

Thanks for this great project!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to