[
https://issues.apache.org/jira/browse/JENA-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326364#comment-17326364
]
Andy Seaborne commented on JENA-2083:
-------------------------------------
Hi [~timhigins]
What sort of errors are we talking about here?
If they are IRI errors: which version of Jena (because JENA-2094 is not just
about '@')?
> Support skipping/ignoring errors with tdbloader
> -----------------------------------------------
>
> Key: JENA-2083
> URL: https://issues.apache.org/jira/browse/JENA-2083
> Project: Apache Jena
> Issue Type: New Feature
> Components: TDB, TDB2
> Reporter: Timothy Higinbottom
> Priority: Major
>
> Hi all,
> I have a fairly large (~22,000) number of N-Triples files I hope to import
> into TDB2 to query with Fuseki.
> I boosted the RAM allotted to the JVM and used the parallel mode from
> tdb2.tdbloader. This whizzed through the first 1,000 of the files.
> However, some of the files are incorrectly serialized, so they caused errors
> when Jena tried to read them. It is not feasible right now to sort out the
> defective files from the good ones before running tdbloader.
> It would be great if tdbloader could add an option to skip the files that
> error so that it can continue to process the other files.
> The main reason this should be part of tdbloader itself is that the
> alternative (running xargs or a loop in Bash) decreases performance because
> then the loading is effectively synchronous and the user can't take advantage
> of the tdbloader modes and batching.
> Thanks for this great project!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)