Dear all,

We are using the MarkLogic Content Pump to push content from many directories 
that have zip files that in turn contain .xml files.
>From the last communication with Geet, we are also using the transform option 
>in order to ingest only xml content.  This suggested filtering approach
using a transform works.

Unfortunately, when mlcp encounters a corrupt zip file (which we possibly can 
get from our sources),
the process terminates.  Is there an option to instruct mlcp to keep going, 
that is, to skip the corrupt skip file, and continue processing the large and
deeply nested directories for the rest of the zip files?  It looks like the 
-tolerate_errors option won't work given that we need to use a transform to 
ingest only xml files,
and that forces the batch size to 1.

Please advise.

We are using the following options:
-input_file_path $inputFilePath \
-mode local -input_compressed true \
-output_uri_replace 
"(\/.+\/+)(?=.+\.zip),'/ourOverrideOfTheURIToRemoveTheLeadingNASPath/'" \
-output_collections "$collections" \
-database $dbName -output_permissions ...
-transform_module /ourNamespace/ourTransformModule.xqy  \
-transform_namespace "http://cas.org/..."; \
-xml_repair_level full \

Thank you,
________________________________
Kristina Morales-Martin
Sr. Technical Information Specialist, e-Content Operations
CAS, a division of the American Chemical Society
2540 Olentangy River Road
Columbus, OH 43202
Phone: 614-447-3600, ext. 2322
Fax: 614-447-3827
www.cas.org<http://www.cas.org/>


Confidentiality Notice: This electronic message transmission, including any 
attachment(s), may contain confidential, proprietary, or privileged information 
from Chemical Abstracts Service ("CAS"), a division of the American Chemical 
Society ("ACS"). If you have received this transmission in error, be advised 
that any disclosure, copying, distribution, or use of the contents of this 
information is strictly prohibited. Please destroy all copies of the message 
and contact the sender immediately by either replying to this message or 
calling 614-447-3600.

_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to