[ 
https://issues.apache.org/jira/browse/NIFI-9224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448470#comment-17448470
 ] 

Wang Qingming edited comment on NIFI-9224 at 11/24/21, 9:36 AM:
----------------------------------------------------------------

Hello, I am also a user in China.We have developed a processor to split large 
csv files.We plan to contribute to the nifi project later.Use commons-csv open 
source project:
 
您好,我们也是中国的nifi用户。我们开发过一个拆分大的csv文件的组件,我们计划稍后贡献到nifi项目。使用commons-csv开源项目:
 
 
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-csv</artifactId>
    <version>1.5</version>
</dependency>
 
 
 The core code is as follows, you can refer to it.
核心代码如下,可以参考。
 

try (InputStream in = session.read(incomingCSV)) {
    InputStreamReader isr = new InputStreamReader(in, charset);
    Reader reader = new BufferedReader(isr);
    CSVParser parser = 
CSVFormat.EXCEL.withHeader(headers).withQuote(null).parse(reader);
    Iterator<CSVRecord> csvIterator = parser.iterator();

    //to read the csv file one by one
    //逐条读取csv文件
    while (csvIterator.hasNext()) {
        CSVRecord record = csvIterator.next();
        //other Handle other business logic
        //处理其他业务逻辑

    }

} catch (IOException e) {
    //log 
}


was (Author: wangqingming):
Hello, I am also a user in China.We have developed a processor to split large 
csv files.We plan to contribute to the nifi project later.Use commons-csv open 
source project:
 
您好,我们也是中国的nifi用户。我们开发过一个拆分大的csv文件的组件,我们计划稍后贡献到nifi项目。使用commons-csv开源项目:
 
 
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-csv</artifactId>
    <version>1.5</version>
</dependency>
 
 
 The core code is as follows, you can refer to it.
核心代码如下,可以参考。
 
try (InputStream in = session.read(incomingCSV)) {
    InputStreamReader isr = new InputStreamReader(in, charset);
    Reader reader = new BufferedReader(isr);
    CSVParser parser = 
CSVFormat.EXCEL.withHeader(headers).withQuote(null).parse(reader);
    Iterator<CSVRecord> csvIterator = parser.iterator();
 
    //to read the csv file one by one
    //逐条读取csv文件
    while (csvIterator.hasNext()) {
        CSVRecord record = csvIterator.next();
        //other Handle other business logic
        //处理其他业务逻辑
 
    }
 
} catch (IOException e) {
  //log 
}
 
 
 

> 按文件分片读取或写文件
> -----------
>
>                 Key: NIFI-9224
>                 URL: https://issues.apache.org/jira/browse/NIFI-9224
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>    Affects Versions: 1.15.0
>            Reporter: Every
>            Priority: Major
>              Labels: fetchfile, fragment
>
> For very large files, fetchfile cannot be read to nifi at once, and no 
> suitable processor shard read files are found, so expand the service and 
> processor that reads files by line shards, writes files by shards, and makes 
> it easy to process very large files with smaller resources.
> 针对超大文件,无法使用fetchfile一次性读取到nifi中,也没有找到合适的处理器分片读取文件,因此扩展按行分片读取文件,按分片写入文件的服务及处理器,便于使用较小的资源处理超大文件。



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to