[
https://issues.apache.org/jira/browse/NIFI-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369456#comment-14369456
]
Joseph Witt commented on NIFI-399:
----------------------------------
WOOHOO! My git ignorance declined slightly. I did this without a merge
commit. Rebase...ftw.
> Rename EvaluateRegularExpression to ExtractText and optimize
> ------------------------------------------------------------
>
> Key: NIFI-399
> URL: https://issues.apache.org/jira/browse/NIFI-399
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Joseph Witt
> Assignee: Joseph Witt
> Labels: deprecation
> Fix For: 0.1.0
>
> Attachments: NIFI-399.patch
>
>
> The processor EvaluateRegularExpression enables some cool extraction of text
> from data. It currently limits matching results to a single matching result.
> It should be updated to allow multiple capture groups per matching term. It
> can keep the current behavior. But can also add inclusion of all matching
> groups 0..n as an index appended to the basename of the attribute.
> In addition the name of this processor (and possibly its tags) needs to be
> updated. The processor is used to extract text from a given document. The
> name should be 'ExtractText'. We can deprecate the old processor in 0.1.0
> and in 0.2.0 pull it out.
> In addition this processor should:
> - Precompile all patterns when the processor is scheduled to run.
> - Create memory buffers that do not exceed the minimum of flow file content
> or max buffer size specified
> - Support more than 1 capturing groups. The default behavior of storing
> capture group 1 at the given name is good. But there is also benefit to
> supporting multiple capture groups in a single execution.
> - Allow the user to specify the maximum length of a capturing group value
> This also prompts the need for a StandardValidator which allows for creation
> of a validator that does a bounds check on a given DataSize.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)