Filtering large CSV files

Eric FALK Tue, 05 Apr 2016 03:43:44 -0700

Dear all,

I would require to filter large csv files in a data flow. By filtering I mean: 
scale down the file in terms of columns, and looking for a particular value to 
match a parameter. I looked into the example, of csv to JSON. I do have a 
couple of questions:


-First I use a SplitText control get each line of the file. It makes things 
slow, as it seems to generate a flow file for each line. Do I have to proceed 
this way, or is there an alternative? My csv files are really large and can 
have millions of lines.

-In a second step I am extracting the values with the (.+),(.+),….,(.+)  
technique, before using a processor to check for a match, on ${csv.146} for 
instance. Now I have a problem: my csv has 233 fields, so I am getting the 
message: “ReGex is required to have between 1 and 40 capturing groups but has 
233”. Again, is there another way to proceed, am I missing something?

Best regards,
Eric

Filtering large CSV files

Reply via email to