[
https://issues.apache.org/jira/browse/JENA-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andy Seaborne closed JENA-2083.
-------------------------------
> Support skipping/ignoring errors with tdbloader
> -----------------------------------------------
>
> Key: JENA-2083
> URL: https://issues.apache.org/jira/browse/JENA-2083
> Project: Apache Jena
> Issue Type: New Feature
> Components: TDB, TDB2
> Reporter: Timothy Higinbottom
> Priority: Major
>
> Hi all,
> I have a fairly large (~22,000) number of N-Triples files I hope to import
> into TDB2 to query with Fuseki.
> I boosted the RAM allotted to the JVM and used the parallel mode from
> tdb2.tdbloader. This whizzed through the first 1,000 of the files.
> However, some of the files are incorrectly serialized, so they caused errors
> when Jena tried to read them. It is not feasible right now to sort out the
> defective files from the good ones before running tdbloader.
> It would be great if tdbloader could add an option to skip the files that
> error so that it can continue to process the other files.
> The main reason this should be part of tdbloader itself is that the
> alternative (running xargs or a loop in Bash) decreases performance because
> then the loading is effectively synchronous and the user can't take advantage
> of the tdbloader modes and batching.
> Thanks for this great project!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)