Dear all, We are using the MarkLogic Content Pump to push content from many directories that have zip files that in turn contain .xml files. >From the last communication with Geet, we are also using the transform option >in order to ingest only xml content. This suggested filtering approach using a transform works.
Unfortunately, when mlcp encounters a corrupt zip file (which we possibly can get from our sources), the process terminates. Is there an option to instruct mlcp to keep going, that is, to skip the corrupt skip file, and continue processing the large and deeply nested directories for the rest of the zip files? It looks like the -tolerate_errors option won't work given that we need to use a transform to ingest only xml files, and that forces the batch size to 1. Please advise. We are using the following options: -input_file_path $inputFilePath \ -mode local -input_compressed true \ -output_uri_replace "(\/.+\/+)(?=.+\.zip),'/ourOverrideOfTheURIToRemoveTheLeadingNASPath/'" \ -output_collections "$collections" \ -database $dbName -output_permissions ... -transform_module /ourNamespace/ourTransformModule.xqy \ -transform_namespace "http://cas.org/..." \ -xml_repair_level full \ Thank you, ________________________________ Kristina Morales-Martin Sr. Technical Information Specialist, e-Content Operations CAS, a division of the American Chemical Society 2540 Olentangy River Road Columbus, OH 43202 Phone: 614-447-3600, ext. 2322 Fax: 614-447-3827 www.cas.org<http://www.cas.org/> Confidentiality Notice: This electronic message transmission, including any attachment(s), may contain confidential, proprietary, or privileged information from Chemical Abstracts Service ("CAS"), a division of the American Chemical Society ("ACS"). If you have received this transmission in error, be advised that any disclosure, copying, distribution, or use of the contents of this information is strictly prohibited. Please destroy all copies of the message and contact the sender immediately by either replying to this message or calling 614-447-3600.
_______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general