Addendum: We actually send this regular expression, to escape the dot, yet mlcp.sh import still does not filter our desired files
-input_file_pattern '.*\.xml' From: Morales-Martin, Kristina Sent: Monday, July 13, 2015 11:43 AM To: '[email protected]' Subject: mlcp.sh help with filtering to ingest only XML files in zip files Dear all, We need help in ingesting a directory of many* zip files, each with many* XML files. We are using the mlcp (Mark Logic Content Pump) out of the box to import content as-is from a directory of zip files. In particular, we are using these options: -mode local \ -input_file_path [a directory that has zip files, each zip file has a heterogenous mix of .xml and other binary files] \ -input_compressed true \ -input_file_pattern '.*.xml' \ -output_uri_replace "(\/.+\/+)(?=.+\.zip),'/ourOverrideOfTheURIToRemoveTheLeadingNASPath/'" \ ... Can anyone help with the -input_file_pattern option? Our intent is to only load the .xml files in the zip files in the directory. We do not want to load other files. For some reason, the -input_file_pattern is not successfully filtering. If you have encountered this non-filtering behavior, what have you done to make it work? Thank you, Kristina Morales-Martin Sr. Technical Information Specialist, e-Content Operations CAS, a division of the American Chemical Society 2540 Olentangy River Road Columbus, OH 43202 Phone: 614-447-3600, ext. 2322 Fax: 614-447-3827 www.cas.org<http://www.cas.org/> Confidentiality Notice: This electronic message transmission, including any attachment(s), may contain confidential, proprietary, or privileged information from Chemical Abstracts Service ("CAS"), a division of the American Chemical Society ("ACS"). If you have received this transmission in error, be advised that any disclosure, copying, distribution, or use of the contents of this information is strictly prohibited. Please destroy all copies of the message and contact the sender immediately by either replying to this message or calling 614-447-3600.
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
