[ 
https://issues.apache.org/jira/browse/NIFI-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Korniets updated NIFI-8932:
-----------------------------------
    Description: 
We have a lot of CSV files where provider add custom header/footer to valid CSV 
content.
 CSV header is actually second row. 

To remove unnecessary data we can use
 * ReplaceText 
 * splitText->RouteOnAttribute -> MergeContent

It would be great to have an option in CSVReader controller to skip N rows from 
top/bottom in order to get5 clean data.
 * skip N from the top
 * skip M from the bottom

 Similar request was developed in FLINK 
https://issues.apache.org/jira/browse/FLINK-1002

 

Data Example:
|7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X)| | | | | | |
|distribution_id|Distribution 
Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
|-1|all|20210719|Repo     21025226|qwerty                                    
|EUR|TPSL_21025226   |19-Jul-21|BRM96ST7                       |ABC 
14/09/24|NR|BOND      |
|-1|all|20210719|Repo     21025226|qwerty                                    
|GBP|RPSS_21025226   |19-Jul-21| |Total @ -0.11| | |

  was:
We have a lot of CSV files where provider add custom header/footer to valid CSV 
content.
CSV header is actually second row. 

To remove unnecessary data we can use
 * ReplaceText 
 * splitText->RouteOnAttribute -> MergeContent

It would be great to have an option in CSVReader controller to skip N rows from 
top/bottom in order to get5 clean data.
 * skip N from the top
 * skip M from the bottom

 

Data Example:
|7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X)| | | | | | |
|distribution_id|Distribution 
Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
|-1|all|20210719|Repo     21025226|qwerty                                    
|EUR|TPSL_21025226   |19-Jul-21|BRM96ST7                       |ABC 
14/09/24|NR|BOND      |
|-1|all|20210719|Repo     21025226|qwerty                                    
|GBP|RPSS_21025226   |19-Jul-21| |Total @ -0.11| | |


> Add feature to CSVReader to skip N lines at top/bottom of the file
> ------------------------------------------------------------------
>
>                 Key: NIFI-8932
>                 URL: https://issues.apache.org/jira/browse/NIFI-8932
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Philipp Korniets
>            Priority: Minor
>
> We have a lot of CSV files where provider add custom header/footer to valid 
> CSV content.
>  CSV header is actually second row. 
> To remove unnecessary data we can use
>  * ReplaceText 
>  * splitText->RouteOnAttribute -> MergeContent
> It would be great to have an option in CSVReader controller to skip N rows 
> from top/bottom in order to get5 clean data.
>  * skip N from the top
>  * skip M from the bottom
>  Similar request was developed in FLINK 
> https://issues.apache.org/jira/browse/FLINK-1002
>  
> Data Example:
> |7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X)| | | | | | |
> |distribution_id|Distribution 
> Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
> |-1|all|20210719|Repo     21025226|qwerty                                    
> |EUR|TPSL_21025226   |19-Jul-21|BRM96ST7                       |ABC 
> 14/09/24|NR|BOND      |
> |-1|all|20210719|Repo     21025226|qwerty                                    
> |GBP|RPSS_21025226   |19-Jul-21| |Total @ -0.11| | |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to