Emil Zegers created TIKA-4246: --------------------------------- Summary: tika-pipes FileSystemFetcher configuration option for file name/path pattern selection Key: TIKA-4246 URL: https://issues.apache.org/jira/browse/TIKA-4246 Project: Tika Issue Type: New Feature Components: tika-pipes Reporter: Emil Zegers
Would be useful to have the possibility to configure FileSystemFetcher for tika-pipes to only process certain files, e.g. based on extension, match on file name/path or similar pattern. This way it would be possible to point to a specific root folder and only process matching files like certain extensions, names (e.g. for GIS files like shapefiles there is same name with multiple extensions) etc. Something like: <properties> <fetchers> <fetcher class="org.apache.tika.pipes.fetcher.fs.FileSystemFetcher"> <params> <name>fsf</name> <basePath>/my/base/path1</basePath> <pattern>myshapefilename.*</pattern> </params> </fetcher> </fetchers> </properties> Or: <pattern>*.doc*,*.pdf</pattern> -- This message was sent by Atlassian Jira (v8.20.10#820010)