Hi,

I am learning NiFi. 

I have created an input csv (CityCode.csv) file as below:
ID,      CITY_NAME,      ZIP_CD,         STATE_CD
1,      Delhi,          110001, DL
2,      Mumbai, 400001, MH
3,      Chennai,        600001, TN
4,      Bangalore,      560001, KA

This is my 1st dataflow. I am building it block by block and I am planning
to create a dataflow like this.  
GetFile -> InferAvroSchema -> SplitText -> ConvertCSVToAvro -> ExtractText
-> if error Put in Kafka
                                                                                
                          -> if success put in DB
I might add few more functionalities in between to strengthen my knowledge.  

InitialFlow.jpg
<http://apache-nifi-developer-list.39713.n7.nabble.com/file/t1006/InitialFlow.jpg>
  

I have created dataflow till ConvertCSVToAvro. I have a few queries in the
flow till now

I use Getfile processor to take a csv file from a directory
D:\ApacheNiFi\source-data. If getfile is successful, then the flow moves to
“CreateInferAvroSchema” 
In InferAvroSchema processor, the flow is configured as below: 

•       Schema Output Destination - flowfile-attribute
•       Input Content Type - CSV
•       CSV Header Definition - 
•       Get CSV Header Definition From Data - true
•       CSV Header Line Skip Count – 1
•       CSV delimiter –  .
•       CSV Escape String -  /
•       CSV Quote String – ‘ 
•       Pretty Avro Output - true
•       Avro Record Name - CityCode
•       Numer of Records To Analyze - 10 
•       Charset – UTF8

Scheduling
Scheduling Strategy - Timer Driven,  Concurrent Tasks – 1, Run Schedule – 0
sec
Settings
•       I have checked original Relationship to Automatically Terminate
Relationships because I am not able to understand what exactly is this
relationship 
•       Failure & Unsupported content – Put in file in directory 
“D:\ApacheNiFi\error-data”
•       Success – SplitText

 The reason why I used SplitText processor before InferAvroSchema processor
is that the schema processor is not able to capture records which are only
failure but send the whole file and add an attribute “error” to failed
records. In one specific post, it was recommended to first split the records
and then convert to avro 
https://stackoverflow.com/questions/41840726/nifi-convertcsvtoavro-how-to-capture-the-failed-records
<https://stackoverflow.com/questions/41840726/nifi-convertcsvtoavro-how-to-capture-the-failed-records>
  

In SplitText Processor, the flow is configured as below: 
Line Split Count        - 1
Header Line Count  - 1       (This I have kept as 1 because I have a header
in my file)
Remove Trailing Newlines -  true

Splits - It flows to next processor “ConvertCSVToAvro”
Original - I have created a processor Putfile and storing the file in a
directory by name "D:\ApacheNiFi\processed-data". 
Failure - I am routing it to the same processor 

1st question:
Is it possible that we can attach some kind of an attribute to distinguish
every record that is split. For eg. Is it possible to attach some unique ID
to each record as an attribute to make it unique? If yes, how can I do that?
Is there any instructions or material available where it will help me to add
an attribute?  I tried to add “UpdateAttribute” processor to check if I can
achieve this, but could not find anything related.  

2nd question:
I also need to check if the input string in each field of the record is of
35 characters. Only then it should execute the “Split” relation. Else the
record should be routed to failure. 

Any guidance will be very helpful. I hope I am not sounding very stupid. 

If there is any material for me to practise these kind of activities like
validating based on some conditions or mentioning a filename for capturing
error records like "InvalidRecords.csv" in the folder mentioned in putfile
processor. Everything seems so confusing and I am not able to find enough
material to learn this.

Thanks for your patience and time

Thanks
Dave



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/

Reply via email to