[jira] [Commented] (JENA-2083) Support skipping/ignoring errors with tdbloader

Andy Seaborne (Jira) Wed, 21 Apr 2021 01:37:05 -0700


    [ 
https://issues.apache.org/jira/browse/JENA-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326364#comment-17326364
 ]


Andy Seaborne commented on JENA-2083:
-------------------------------------

Hi [~timhigins]

What sort of errors are we talking about here? 

If they are IRI errors: which version of Jena (because JENA-2094 is not just 
about '@')?


> Support skipping/ignoring errors with tdbloader
> -----------------------------------------------
>
>                 Key: JENA-2083
>                 URL: https://issues.apache.org/jira/browse/JENA-2083
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: TDB, TDB2
>            Reporter: Timothy Higinbottom
>            Priority: Major
>
> Hi all,
> I have a fairly large (~22,000) number of N-Triples files I hope to import 
> into TDB2 to query with Fuseki.
> I boosted the RAM allotted to the JVM and used the parallel mode from 
> tdb2.tdbloader. This whizzed through the first 1,000 of the files.
> However, some of the files are incorrectly serialized, so they caused errors 
> when Jena tried to read them. It is not feasible right now to sort out the 
> defective files from the good ones before running tdbloader.
> It would be great if tdbloader could add an option to skip the files that 
> error so that it can continue to process the other files.
> The main reason this should be part of tdbloader itself is that the 
> alternative (running xargs or a loop in Bash) decreases performance because 
> then the loading is effectively synchronous and the user can't take advantage 
> of the tdbloader modes and batching.
> Thanks for this great project!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (JENA-2083) Support skipping/ignoring errors with tdbloader

Reply via email to