Jasper Knulst created NIFI-7946:
-----------------------------------

             Summary: Add property to CSVReader to treat multiple delimiters as 
1
                 Key: NIFI-7946
                 URL: https://issues.apache.org/jira/browse/NIFI-7946
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
    Affects Versions: 1.12.1
            Reporter: Jasper Knulst
            Assignee: Jasper Knulst


I would be really great to have an additional property for the CSVReader 
controller service to treat multiple consecutive delimiter occurrences as only 
one. This a something you can do in Excel for instance.

There are many CSV like formats that have multiple delimiters following each 
other, for instance to aid in aligning columns:

Device          rReq_PS      wReq_PS        rKB_PS        wKB_PS  avgWaitMillis 
  avgSvcMillis   bandwUtilPct
md1                 0.0          0.0           0.0           0.0            0.0 
           0.0              0
md10                0.0          8.0           0.0          75.8            0.0 
           7.5              5
md11                0.0          0.0           0.0           0.0            0.0 
           0.0              0
md20                0.0          8.0           0.0          75.8            0.0 
           6.7              5
md21                0.0          0.0           0.0           0.0            0.0 
           0.0              0
md30                0.0          0.0           0.0           0.0            0.0 
           0.0              0
md100               0.0          8.0           0.0          75.8            0.0 
           8.1              6
sd0                 0.0          0.0           0.0           0.0            0.0 
           0.0              0
sd1                 0.0          8.0           0.0          75.8            0.0 
           6.6              5
sd2                 0.0          0.0           0.0           0.0            0.0 
           0.0              0
sd3                 0.0          8.0           0.0          75.8            0.0 
           7.4              5

Executing the CSV Reader with " " as delimiter on the above leads to havoc. If 
only the CSVReader would treat the input as below:

Device rReq_PS wReq_PS rKB_PS wKB_PS avgWaitMillis avgSvcMillis bandwUtilPct
md1 0.0 0.0 0.0 0.0 0.0 0.0 0
md10 0.0 8.0 0.0 75.8 0.0 7.5 5
md11 0.0 0.0 0.0 0.0 0.0 0.0 0
md20 0.0 8.0 0.0 75.8 0.0 6.7 5
md21 0.0 0.0 0.0 0.0 0.0 0.0 0
md30 0.0 0.0 0.0 0.0 0.0 0.0 0
md100 0.0 8.0 0.0 75.8 0.0 8.1 6
sd0 0.0 0.0 0.0 0.0 0.0 0.0 0
sd1 0.0 8.0 0.0 75.8 0.0 6.6 5
sd2 0.0 0.0 0.0 0.0 0.0 0.0 0
sd3 0.0 8.0 0.0 75.8 0.0 7.4 5

I would go well. I know that a ReplaceText processor could do the same easily 
as a preceding step, but this is not always possible (with no-input processors 
like TCPRecordReader) and I also believe less processors is better 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to