[ 
https://issues.apache.org/jira/browse/NIFI-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369453#comment-14369453
 ] 

ASF subversion and git services commented on NIFI-399:
------------------------------------------------------

Commit ad18853b589d80331e2f4574bce35d79bce09c28 in incubator-nifi's branch 
refs/heads/develop from [~joewitt]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-nifi.git;h=ad18853 ]

NIFI-399 initial port


> Rename EvaluateRegularExpression to ExtractText and optimize
> ------------------------------------------------------------
>
>                 Key: NIFI-399
>                 URL: https://issues.apache.org/jira/browse/NIFI-399
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Joseph Witt
>            Assignee: Joseph Witt
>              Labels: deprecation
>             Fix For: 0.1.0
>
>         Attachments: NIFI-399.patch
>
>
> The processor EvaluateRegularExpression enables some cool extraction of text 
> from data.  It currently limits matching results to a single matching result. 
>  It should be updated to allow multiple capture groups per matching term.  It 
> can keep the current behavior. But can also add inclusion of all matching 
> groups 0..n as an index appended to the basename of the attribute.
> In addition the name of this processor (and possibly its tags) needs to be 
> updated.  The processor is used to extract text from a given document.  The 
> name should be 'ExtractText'.  We can deprecate the old processor in 0.1.0 
> and in 0.2.0 pull it out. 
> In addition this processor should:
> - Precompile all patterns when the processor is scheduled to run.
> - Create memory buffers that do not exceed the minimum of flow file content 
> or max buffer size specified
> - Support more than 1 capturing groups.  The default behavior of storing 
> capture group 1 at the given name is good.  But there is also benefit to 
> supporting multiple capture groups in a single execution.
> - Allow the user to specify the maximum length of a capturing group value
> This also prompts the need for a StandardValidator which allows for creation 
> of a validator that does a bounds check on a given DataSize.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to