Dear Geert, Thank you for your help. We will try the workaround that you had suggested.
Kristina Morales-Martin Sr. Technical Information Specialist, e-Content Operations CAS, a division of the American Chemical Society 2540 Olentangy River Road Columbus, OH 43202 Phone: 614-447-3600, ext. 2322 Fax: 614-447-3827 www.cas.org<http://www.cas.org/> From: [email protected] [mailto:[email protected]] On Behalf Of Geert Josten Sent: Monday, July 13, 2015 1:43 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] mlcp.sh help with filtering to ingest only XML files in zip files Hi Kristina, I'm afraid it is being ignored as it normally applies to the files read from disk. I might have a workaround though. Setup a transform that checks the uri, and if not ends with .xml make the transform function return an empty sequence. That will cause MLCP to skip ingest of the file.. Kind regards, Geert From: <[email protected]<mailto:[email protected]>> on behalf of "Morales-Martin, Kristina" <[email protected]<mailto:[email protected]>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Monday, July 13, 2015 at 5:45 PM To: "'[email protected]<mailto:'[email protected]>'" <[email protected]<mailto:[email protected]>> Subject: Re: [MarkLogic Dev General] mlcp.sh help with filtering to ingest only XML files in zip files Addendum: We actually send this regular expression, to escape the dot, yet mlcp.sh import still does not filter our desired files -input_file_pattern '.*\.xml' From: Morales-Martin, Kristina Sent: Monday, July 13, 2015 11:43 AM To: '[email protected]<mailto:'[email protected]>' Subject: mlcp.sh help with filtering to ingest only XML files in zip files Dear all, We need help in ingesting a directory of many* zip files, each with many* XML files. We are using the mlcp (Mark Logic Content Pump) out of the box to import content as-is from a directory of zip files. In particular, we are using these options: -mode local \ -input_file_path [a directory that has zip files, each zip file has a heterogenous mix of .xml and other binary files] \ -input_compressed true \ -input_file_pattern '.*.xml' \ -output_uri_replace "(\/.+\/+)(?=.+\.zip),'/ourOverrideOfTheURIToRemoveTheLeadingNASPath/'" \ ... Can anyone help with the -input_file_pattern option? Our intent is to only load the .xml files in the zip files in the directory. We do not want to load other files. For some reason, the -input_file_pattern is not successfully filtering. If you have encountered this non-filtering behavior, what have you done to make it work? Thank you, Kristina Morales-Martin Sr. Technical Information Specialist, e-Content Operations CAS, a division of the American Chemical Society 2540 Olentangy River Road Columbus, OH 43202 Phone: 614-447-3600, ext. 2322 Fax: 614-447-3827 www.cas.org<http://www.cas.org/> Confidentiality Notice: This electronic message transmission, including any attachment(s), may contain confidential, proprietary, or privileged information from Chemical Abstracts Service ("CAS"), a division of the American Chemical Society ("ACS"). If you have received this transmission in error, be advised that any disclosure, copying, distribution, or use of the contents of this information is strictly prohibited. Please destroy all copies of the message and contact the sender immediately by either replying to this message or calling 614-447-3600. Confidentiality Notice: This electronic message transmission, including any attachment(s), may contain confidential, proprietary, or privileged information from Chemical Abstracts Service ("CAS"), a division of the American Chemical Society ("ACS"). If you have received this transmission in error, be advised that any disclosure, copying, distribution, or use of the contents of this information is strictly prohibited. Please destroy all copies of the message and contact the sender immediately by either replying to this message or calling 614-447-3600.
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
