Hi Rifat, 

The prefix property is to filter by the hierarchical structure S3 exposes [1]. 
I don’t believe there is a native way to use string comparisons or regular 
expressions on the remainder of the object keys list in S3. 

Rather than have a “suffix” property which only allows object-level filtering 
on one end of the name, I think a “pattern” property which allows regular 
expression matching would be useful. However, this would need to use the Java 
SDK [2] and operate locally on S3ObjectSummary objects which have been 
retrieved from the server. So this could certainly reduce the number of 
flowfiles output from the processor (i.e. only objects which end in .csv would 
be created as flowfiles) but the whole list (e.g. .csv + .pdf + .txt) would be 
retrieved from the S3 server and filtered within the ListS3 processor. 

You can submit a Jira ticket for a feature request here [3]. 

[1] https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html 
<https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html>
[2] 
https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingJava.html 
<https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingJava.html>
[3] https://issues.apache.org/jira/secure/CreateIssue!default.jspa 
<https://issues.apache.org/jira/secure/CreateIssue!default.jspa>
 
Andy LoPresto
[email protected]
[email protected]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Mar 18, 2020, at 11:49 AM, Evans, Jason D. <[email protected]> 
> wrote:
> 
> Hey Rifat – cool idea.  Is the ListS3 processor something we wrote?  Or are 
> you suggesting contributing to Nifi itself?
>  
> From: "Chowdhury, Rifat" <[email protected]>
> Date: Wednesday, March 18, 2020 at 2:45 PM
> To: "[email protected]" <[email protected]>
> Cc: "Loyot, Brendan" <[email protected]>, "Nagwani, Dheeraj X. -ND" 
> <[email protected]>, "Evans, Jason D." 
> <[email protected]>, "Rudra, Anshuman X. -ND" 
> <[email protected]>
> Subject: Nifi Enhancement Idea
>  
> Hi,
> 
> For processors such as ListS3, would it be beneficial to add a new field 
> called “suffix” so that when I configure my ListS3 processor, I can filter my 
> incoming files furthers in addition to prefix? The current Template I follow 
> is ListS3 -> RouteOnAttribute(filename endsWith my suffix criteria) Like if I 
> want to filter for only .csv or .tsv or .gz or .txt files or any other types 
> of files. Or another Option is have prefix field accept some sort of Java 
> REGEX Pattern matching. What do you guys think? Shall I create a PR for this 
> against Nifi Branch for future releases? I don’t feel the need to create a 
> custom processor for this rather small enhancement.
>  
> Best Regards, Rifat Chowdhury
> Software Engineer, Data Platforms
> 
>  THE WALT DISNEY COMPANY | DIRECT-TO-CONSUMER AND INTERNATIONAL
> [email protected] <mailto:[email protected]>

Reply via email to