Jasper Knulst created NIFI-7946:
-----------------------------------
Summary: Add property to CSVReader to treat multiple delimiters as
1
Key: NIFI-7946
URL: https://issues.apache.org/jira/browse/NIFI-7946
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework
Affects Versions: 1.12.1
Reporter: Jasper Knulst
Assignee: Jasper Knulst
I would be really great to have an additional property for the CSVReader
controller service to treat multiple consecutive delimiter occurrences as only
one. This a something you can do in Excel for instance.
There are many CSV like formats that have multiple delimiters following each
other, for instance to aid in aligning columns:
Device rReq_PS wReq_PS rKB_PS wKB_PS avgWaitMillis
avgSvcMillis bandwUtilPct
md1 0.0 0.0 0.0 0.0 0.0
0.0 0
md10 0.0 8.0 0.0 75.8 0.0
7.5 5
md11 0.0 0.0 0.0 0.0 0.0
0.0 0
md20 0.0 8.0 0.0 75.8 0.0
6.7 5
md21 0.0 0.0 0.0 0.0 0.0
0.0 0
md30 0.0 0.0 0.0 0.0 0.0
0.0 0
md100 0.0 8.0 0.0 75.8 0.0
8.1 6
sd0 0.0 0.0 0.0 0.0 0.0
0.0 0
sd1 0.0 8.0 0.0 75.8 0.0
6.6 5
sd2 0.0 0.0 0.0 0.0 0.0
0.0 0
sd3 0.0 8.0 0.0 75.8 0.0
7.4 5
Executing the CSV Reader with " " as delimiter on the above leads to havoc. If
only the CSVReader would treat the input as below:
Device rReq_PS wReq_PS rKB_PS wKB_PS avgWaitMillis avgSvcMillis bandwUtilPct
md1 0.0 0.0 0.0 0.0 0.0 0.0 0
md10 0.0 8.0 0.0 75.8 0.0 7.5 5
md11 0.0 0.0 0.0 0.0 0.0 0.0 0
md20 0.0 8.0 0.0 75.8 0.0 6.7 5
md21 0.0 0.0 0.0 0.0 0.0 0.0 0
md30 0.0 0.0 0.0 0.0 0.0 0.0 0
md100 0.0 8.0 0.0 75.8 0.0 8.1 6
sd0 0.0 0.0 0.0 0.0 0.0 0.0 0
sd1 0.0 8.0 0.0 75.8 0.0 6.6 5
sd2 0.0 0.0 0.0 0.0 0.0 0.0 0
sd3 0.0 8.0 0.0 75.8 0.0 7.4 5
I would go well. I know that a ReplaceText processor could do the same easily
as a preceding step, but this is not always possible (with no-input processors
like TCPRecordReader) and I also believe less processors is better
--
This message was sent by Atlassian Jira
(v8.3.4#803005)