[ 
https://issues.apache.org/jira/browse/MINIFICPP-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812319#comment-16812319
 ] 

Arpad Boda commented on MINIFICPP-726:
--------------------------------------

[~phrocker]: yes, some has been made, implemented the regex logic. 

Your point is absolutely fair on the I/O point. 

My initial idea for this was to provide a caching content repo: writes data to 
IO, but only removes data from memory in case it wasn't read for a while. This 
timeout could be configured, I think this feature could ensure that no I/O read 
happens as the data is kept in memory by the time the given flowfile goes 
through the flowchain. 

Doing it in write phase sounds better (more efficient in memory handling), but 
I wonder how can we do that while we are trying to keep compatibility with NiFi.

A NiFi-Fn-like behavior (all or nothing, only persist at the end of the 
flowchain) would also make sense in such cases. This would allow even 
modification of the content without paying huge IO costs. 

I think this topic definitely worth some brainstorming, will create a Jira to 
collect ideas. 

> Enhance ExtractText to have more feature parity with the Java impl
> ------------------------------------------------------------------
>
>                 Key: MINIFICPP-726
>                 URL: https://issues.apache.org/jira/browse/MINIFICPP-726
>             Project: NiFi MiNiFi C++
>          Issue Type: New Feature
>            Reporter: Aldrin Piri
>            Assignee: Arpad Boda
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> ExtractText is limited in terms of functionality in contrast to the Java 
> variant 
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.8.0/org.apache.nifi.processors.standard.ExtractText/index.html.
> Currently, the processor only allows promoting the entirety of the content to 
> an attribute.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to