Timothy Higinbottom created JENA-2083:
-----------------------------------------
Summary: Support skipping/ignoring errors with tdbloader
Key: JENA-2083
URL: https://issues.apache.org/jira/browse/JENA-2083
Project: Apache Jena
Issue Type: New Feature
Components: TDB, TDB2
Reporter: Timothy Higinbottom
Hi all,
I have a fairly large (~22,000) number of N-Triples files I hope to import into
TDB2 to query with Fuseki.
I boosted the RAM allotted to the JVM and used the parallel mode from
tdb2.tdbloader. This whizzed through the first 1,000 of the files.
However, some of the files are incorrectly serialized, so they caused errors
when Jena tried to read them. It is not feasible right now to sort out the
defective files from the good ones before running tdbloader.
It would be great if tdbloader could add an option to skip the files that error
so that it can continue to process the other files.
The main reason this should be part of tdbloader itself is that the alternative
(running xargs or a loop in Bash) decreases performance because then the
loading is effectively synchronous and the user can't take advantage of the
tdbloader modes and batching.
Thanks for this great project!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)