You need to extract the relevant fields and either modify the flowfile content inline (losing the other data) or create a new flowfile (you can still retain the complete content in the “original” flowfile) and pass the flowfile with only the content you want to perform the hash on to the FuzzyHashContent processor.
For the data you have provided (I’m assuming this is a single line of values, rather than the structure and there exist many lines), you could use a ReplaceText processor to drop unrelated columns. If you have multiple rows in the flowfile content, you can use a CSVRecordReader/ScriptedReader and CSVRecordSetWriter/ScriptedRecordSetWriter in conjunction with an UpdateRecord processor to reduce the content down to just the relevant fields, and then use a SplitRecord processor to generate individual flowfiles from each line, and pass all of them to FuzzyHashContent. Andy LoPresto [email protected] [email protected] PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Oct 9, 2017, at 4:19 AM, shankhamajumdar <[email protected]> > wrote: > > Hi Andy, > > Thanks for the reply. But I am still not able to solve my use case. For > example > > I have a data file in the below structure. > > Col1 Col2 Col3 Col4 Col5 > > Test1 Test2 Test3 Test4 Test5 > > I want to do a fuzzy matching on Col2 and Col3 and generate an output file. > > I am using getFile and FuzzyHashContent processor but not able to design the > flow. Need your help on this. > > Regards, > Shankha > > > > > > > -- > Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
signature.asc
Description: Message signed with OpenPGP using GPGMail
