[
https://issues.apache.org/jira/browse/NIFI-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Philipp Korniets updated NIFI-8932:
-----------------------------------
Description:
We have a lot of CSV files where provider add custom header/footer to valid CSV
content.
CSV header is actually second row.
To remove unnecessary data we can use
* ReplaceText
* splitText->RouteOnAttribute -> MergeContent
It would be great to have an option in CSVReader controller to skip N rows from
top/bottom in order to get5 clean data.
* skip N from the top
* skip M from the bottom
Similar request was developed in FLINK
https://issues.apache.org/jira/browse/FLINK-1002
Data Example:
{code}
7/20/21 2:48:47 AM GMT-04:00 ABB: Blended Rate Calc (X),,,,,,,,,,,
distribution_id,Distribution
Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type
-1,all,20210719,Repo 21025226,qwerty
,EUR,TPSL_21025226 ,19-Jul-21,BRM96ST7 ,ABC
14/09/24,NR,BOND
-1,all,20210719,Repo 21025226,qwerty
,GBP,RPSS_21025226 ,19-Jul-21,,Total @ -0.11,,
{code}
|7/20/21 2:48:47 AM GMT-04:00 ABB: Blended Rate Calc (X)| | | | | | | |
| | | |
|distribution_id|Distribution
Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
|-1|all|20210719|Repo 21025226|qwerty
|EUR|TPSL_21025226 |19-Jul-21|BRM96ST7 |ABC
14/09/24|NR|BOND |
|-1|all|20210719|Repo 21025226|qwerty
|GBP|RPSS_21025226 |19-Jul-21| |Total @ -0.11| | |
was:
We have a lot of CSV files where provider add custom header/footer to valid CSV
content.
CSV header is actually second row.
To remove unnecessary data we can use
* ReplaceText
* splitText->RouteOnAttribute -> MergeContent
It would be great to have an option in CSVReader controller to skip N rows from
top/bottom in order to get5 clean data.
* skip N from the top
* skip M from the bottom
Similar request was developed in FLINK
https://issues.apache.org/jira/browse/FLINK-1002
Data Example:
{code}
7/20/21 2:48:47 AM GMT-04:00 ABB: Blended Rate Calc (X),,,,,,,,,,,
distribution_id,Distribution
Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type
-1,all,20210719,Repo 21025226,qwerty
,EUR,TPSL_21025226 ,19-Jul-21,BRM96ST7 ,ABC
14/09/24,NR,BOND
-1,all,20210719,Repo 21025226,qwerty
,GBP,RPSS_21025226 ,19-Jul-21,,Total @ -0.11,,
{code}
|7/20/21 2:48:47 AM GMT-04:00 ABB: Blended Rate Calc (X)| | | | | | | |
| | | | |
|distribution_id|Distribution
Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
|-1|all|20210719|Repo 21025226|qwerty
|EUR|TPSL_21025226 |19-Jul-21|BRM96ST7 |ABC
14/09/24|NR|BOND |
|-1|all|20210719|Repo 21025226|qwerty
|GBP|RPSS_21025226 |19-Jul-21| |Total @ -0.11| | |
> Add feature to CSVReader to skip N lines at top/bottom of the file
> ------------------------------------------------------------------
>
> Key: NIFI-8932
> URL: https://issues.apache.org/jira/browse/NIFI-8932
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Philipp Korniets
> Priority: Minor
>
> We have a lot of CSV files where provider add custom header/footer to valid
> CSV content.
> CSV header is actually second row.
> To remove unnecessary data we can use
> * ReplaceText
> * splitText->RouteOnAttribute -> MergeContent
> It would be great to have an option in CSVReader controller to skip N rows
> from top/bottom in order to get5 clean data.
> * skip N from the top
> * skip M from the bottom
> Similar request was developed in FLINK
> https://issues.apache.org/jira/browse/FLINK-1002
>
> Data Example:
> {code}
> 7/20/21 2:48:47 AM GMT-04:00 ABB: Blended Rate Calc (X),,,,,,,,,,,
> distribution_id,Distribution
> Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type
> -1,all,20210719,Repo 21025226,qwerty
> ,EUR,TPSL_21025226 ,19-Jul-21,BRM96ST7 ,ABC
> 14/09/24,NR,BOND
> -1,all,20210719,Repo 21025226,qwerty
> ,GBP,RPSS_21025226 ,19-Jul-21,,Total @ -0.11,,
> {code}
> |7/20/21 2:48:47 AM GMT-04:00 ABB: Blended Rate Calc (X)| | | | | | |
> | | | | |
> |distribution_id|Distribution
> Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
> |-1|all|20210719|Repo 21025226|qwerty
> |EUR|TPSL_21025226 |19-Jul-21|BRM96ST7 |ABC
> 14/09/24|NR|BOND |
> |-1|all|20210719|Repo 21025226|qwerty
> |GBP|RPSS_21025226 |19-Jul-21| |Total @ -0.11| | |
--
This message was sent by Atlassian Jira
(v8.3.4#803005)