[ 
https://issues.apache.org/jira/browse/NIFI-399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Witt updated NIFI-399:
-----------------------------
    Description: 
The processor EvaluateRegularExpression enables some cool extraction of text 
from data.  It currently limits matching results to a single matching result.  
It should be updated to allow multiple capture groups per matching term.  It 
can keep the current behavior. But can also add inclusion of all matching 
groups 0..n as an index appended to the basename of the attribute.

In addition the name of this processor (and possibly its tags) needs to be 
updated.  The processor is used to extract text from a given document.  The 
name should be 'ExtractText'.  We can deprecate the old processor in 0.1.0 and 
in 0.2.0 pull it out. 

In addition this processor should:
- Precompile all patterns when the processor is scheduled to run.
- Create memory buffers that do not exceed the minimum of flow file content or 
max buffer size specified
- Support more than 1 capturing groups.  The default behavior of storing 
capture group 1 at the given name is good.  But there is also benefit to 
supporting multiple capture groups in a single execution.
- Allow the user to specify the maximum length of a capturing group value


  was:
The processor EvaluateRegularExpression enables some cool extraction of text 
from data.  It currently limits matching results to a single matching result.  
It should be updated to allow multiple capture groups per matching term.  It 
can keep the current behavior. But can also add inclusion of all matching 
groups 0..n as an index appended to the basename of the attribute.

In addition the name of this processor (and possibly its tags) needs to be 
updated.  The processor is used to extract text from a given document.  The 
name should be 'ExtractText'.  We can deprecate the old processor in 0.1.0 and 
in 0.2.0 pull it out. 


> Rename EvaluateRegularExpression to ExtractText and optimize
> ------------------------------------------------------------
>
>                 Key: NIFI-399
>                 URL: https://issues.apache.org/jira/browse/NIFI-399
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Joseph Witt
>            Assignee: Joseph Witt
>             Fix For: 0.1.0
>
>
> The processor EvaluateRegularExpression enables some cool extraction of text 
> from data.  It currently limits matching results to a single matching result. 
>  It should be updated to allow multiple capture groups per matching term.  It 
> can keep the current behavior. But can also add inclusion of all matching 
> groups 0..n as an index appended to the basename of the attribute.
> In addition the name of this processor (and possibly its tags) needs to be 
> updated.  The processor is used to extract text from a given document.  The 
> name should be 'ExtractText'.  We can deprecate the old processor in 0.1.0 
> and in 0.2.0 pull it out. 
> In addition this processor should:
> - Precompile all patterns when the processor is scheduled to run.
> - Create memory buffers that do not exceed the minimum of flow file content 
> or max buffer size specified
> - Support more than 1 capturing groups.  The default behavior of storing 
> capture group 1 at the given name is good.  But there is also benefit to 
> supporting multiple capture groups in a single execution.
> - Allow the user to specify the maximum length of a capturing group value



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to