Vyshali,
This would make an excellent example of a ScriptedRecordSetWriter [1].
You can use ConvertRecord with a CSVReader to read in the records
(which can be accessed by field name), and replace each with a fake
instance of the same using your ScriptedRecordSetWriter
implementation. I see you're using defaultdict() to initialize fake
instances on-the-fly, to replace the same field values in all records
with the same fake instance, thereby getting some anonymization while
retaining the referential integrity. Very cool!
You can set up the defaultdict() stuff in the constructor to the
ScriptedRecordSetWriter, and can get access to the fields/values from
the incoming Record, then write out the fake lookups of them. Check
out Drew Lim's awesome article on how to use ScriptedRecordSetWriter
[2], he uses XML as the output format but you just need to write out
CSV or JSON or whatever, using the fake versions of the fields'
values. Please let me know if you try this and run into any trouble,
we'd be happy to help get you going.
If you'd rather stick with ExecuteScript, try adding some logging in
there (using the provided "log" object, you can do things like
log.info("my text = " + text) and such) to see if you are getting the
lookups correctly, and getting the conversion to JSON correctly.
Regards,
Matt
[1]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-scripting-nar/1.4.0/org.apache.nifi.record.script.ScriptedRecordSetWriter/index.html
[2]
https://community.hortonworks.com/articles/115311/convert-csv-to-json-avro-xml-using-convertrecord-p.html
On Sun, Nov 5, 2017 at 1:52 PM, Vyshali <[email protected]> wrote:
> Hi,
>
> I have modified the code by understanding the dictReader
> funtionality.However I face some issues.
> I have added my code here.Using "GetFile" processor a csv file is read with
> fields name,phone_number,email and ssn, which is then sent as input to
> executescript.I'm using dictReader to read the data and modify the fields
> using Faker package..But I'm unable to modify.I'm getting empty json at the
> end.Please give me some suggestions on how to solve this problem.
>
> import java.io
> from org.apache.commons.io import IOUtils
> from java.nio.charset import StandardCharsets
> from org.apache.nifi.processor.io import StreamCallback
> import unicodecsv as csv
> from faker import Factory
> from collections import defaultdict
> import json
> import csv
> import io
>
> class TransformCallback(StreamCallback):
> def _init_(self):
> pass
>
> def process(self,inputStream,outputStream):
> inputdata =
> IOUtils.toString(inputStream,StandardCharsets.ISO_8859_1)
> text = csv.DictReader(io.StringIO(inputdata))
> faker = Factory.create()
> names = defaultdict(faker.name)
> emails = defaultdict(faker.email)
> ssns = defaultdict(faker.ssn)
> phone_numbers = defaultdict(faker.phone_number)
>
> for row in text:
> row["name"] = names[row["name"]]
> row["email"] = emails[row["email"]]
> row["ssn"] = ssns[row["ssn"]]
> row["phone_number"] = phone_numbers[row["phone_number"]]
> textdata = list(text)
> values_str = json.dumps(textdata)
> outputStream.write(values_str.encode('utf-8'))
>
> flowFile = session.get()
> if flowFile != None:
> flowFile = session.write(flowFile,TransformCallback())
> session.transfer(flowFile, REL_SUCCESS)
> session.commit()
>
> References of the code:
> http://go.databricks.com/hubfs/notebooks/blogs/Healthcare%20PII%20anonymization/Healthcare%20PII%20anonymization%20example.html
> <http://go.databricks.com/hubfs/notebooks/blogs/Healthcare%20PII%20anonymization/Healthcare%20PII%20anonymization%20example.html>
> https://stackoverflow.com/questions/31658115/python-csv-dictreader-parse-string
> <https://stackoverflow.com/questions/31658115/python-csv-dictreader-parse-string>
> https://stackoverflow.com/questions/19664145/how-to-convert-list-of-nested-dictionaries-into-string-and-vice-versa
> <https://stackoverflow.com/questions/19664145/how-to-convert-list-of-nested-dictionaries-into-string-and-vice-versa>
>
>
> Thanks,
> Vyshali
>
>
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/