You need to extract the relevant fields and either modify the flowfile content 
inline (losing the other data) or create a new flowfile (you can still retain 
the complete content in the “original” flowfile) and pass the flowfile with 
only the content you want to perform the hash on to the FuzzyHashContent 
processor.

For the data you have provided (I’m assuming this is a single line of values, 
rather than the structure and there exist many lines), you could use a 
ReplaceText processor to drop unrelated columns. If you have multiple rows in 
the flowfile content, you can use a CSVRecordReader/ScriptedReader and 
CSVRecordSetWriter/ScriptedRecordSetWriter in conjunction with an UpdateRecord 
processor to reduce the content down to just the relevant fields, and then use 
a SplitRecord processor to generate individual flowfiles from each line, and 
pass all of them to FuzzyHashContent.


Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Oct 9, 2017, at 4:19 AM, shankhamajumdar <[email protected]> 
> wrote:
> 
> Hi Andy,
> 
> Thanks for the reply. But I am still not able to solve my use case. For
> example
> 
> I have a data file in the below structure.
> 
> Col1      Col2      Col3      Col4      Col5
> 
> Test1    Test2     Test3     Test4     Test5
> 
> I want to do a fuzzy matching on Col2 and Col3 and generate an output file.
> 
> I am using getFile and FuzzyHashContent processor but not able to design the
> flow. Need your help on this.
> 
> Regards,
> Shankha
> 
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to