[
https://issues.apache.org/jira/browse/JENA-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andy Seaborne resolved JENA-1325.
---------------------------------
Resolution: Won't Fix
Assignee: Andy Seaborne
> RIOT parse many files at once, output only valid ones
> -----------------------------------------------------
>
> Key: JENA-1325
> URL: https://issues.apache.org/jira/browse/JENA-1325
> Project: Apache Jena
> Issue Type: Improvement
> Components: RIOT
> Environment: GNU/Linux
> Reporter: Laura
> Assignee: Andy Seaborne
>
> This issue is more or less related to this other one
> https://issues.apache.org/jira/browse/JENA-1322
> I have a folder with thousands of files, mostly small RDF/XML files. I'm
> using RIOT to validate them and dump the valid ones into ntriples files. The
> problem is that calling RIOT on each file is not going to cut it. The
> overhead is significant enough that this operation is just too slow (hours).
> So I've tried to call RIOT only once on all files together using
> {noformat}
> riot \
> --verbose \
> --stop \
> --check \
> --strict \
> --output=nt \
> files/*.rdf > files.nt
> {noformat}
> and in this way validation is much faster. The problem is, that it's still
> dumping invalid files to the .nt output file. I'm downloading these files
> from the Internet, so I'm not going to fix them myself, I just want to skip
> bad files.
> Now, to be clear, I understand that RIOT is of course not meant to fix bad
> data, and I'm not asking for this. I'm suggesting however to add an
> *--option* such that RIOT can do the following:
> 1. parse multiple files at once (so that there is no need to invoke the same
> RIOT command for each file)
> 2. for every file, check/validate it
> 3. if *--output* is set, only output those files or triples that didn't raise
> any ERROR
> I think this is well in the scope of RIOT functionalities. Could this option
> please be added to RIOT?
> Thank you.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)