[ 
https://issues.apache.org/jira/browse/NIFI-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792403#comment-16792403
 ] 

Dmitry Mashkov edited comment on NIFI-6093 at 4/16/19 8:49 AM:
---------------------------------------------------------------

Hi Matt,

 

My case is, I receive very huge XML files at once, one file 8GB, another 27GB. 
To make parsing process more reliable, I did chunk by chunk process, as I 
assume file contains millions of records, first step, I split it by 1million 
records, next step, each of 1m records split by 100k records, next step, each 
of 100k records split by 1000 records. Of course I need control when each step 
of chunking is complete, I use Wait/Notify. Wait processor needs info how many 
Notifications should expect to pass through 1 chunk. Please take a look to 
other "brothers" Split processors, they are all copy info about splits to 
original relationship, exactly for these purposes. Of course, you are right, 
fragment.index useless on original relationship, but _count_ and _id_ should be 
present.

If you have more questions, you are welcome. 


was (Author: dreadolph):
Hi Matt,

 

My case is, I receive very huge XML files at once, one file 8GB, another 27GB. 
To make parsing process more reliable, I did chunk by chunk process, as I 
assume file contains millions of records, first step, I split it by 1million 
records, next step, each of 1m records split by 100k records, next step, each 
of 100k records split by 1000 records. Of course I need control when each step 
of chunking is complete, I use Wait/Notify. Wait processor needs info how many 
Notifications should expect to pass through 1 chunk. Please take a look to 
other "brothers" Split processors, whey are all copy info about splits to 
original relationship, exactly for these purposes. Of course, you are right, 
fragment.index useless on original relationship, but _count_ and _id_ should be 
present.

If you have more questions, you are welcome. 

> SplitRecord processor doesn't propagate fragment* attributes to original 
> relationship
> -------------------------------------------------------------------------------------
>
>                 Key: NIFI-6093
>                 URL: https://issues.apache.org/jira/browse/NIFI-6093
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.9.0
>            Reporter: Dmitry Mashkov
>            Assignee: Matt Burgess
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello Team, 
> As I already described in summary, SplitRecord processor missed fragment* 
> attributes as result it is impossible to use Wait/Notify pattern to wait 
> splits processing. 
> I think follow patch can be applied 
> {code:java}
> Index: 
> nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitRecord.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> --- 
> nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitRecord.java
>  (date 1550371815000)
> +++ 
> nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitRecord.java
>  (date 1551441180000)
> @@ -206,7 +206,8 @@
> return;
> }
> - session.transfer(original, REL_ORIGINAL);
> + final FlowFile originalFlowFile = 
> FragmentAttributes.copyAttributesToOriginal(session, original, fragmentId, 
> splits.size());
> + session.transfer(originalFlowFile, REL_ORIGINAL);
> // Add the fragment count to each split
> for(FlowFile split : splits) {
> session.putAttribute(split, FRAGMENT_COUNT, String.valueOf(splits.size()));
> {code}
>  
> Sincerely,
> Dmitry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to