[
https://issues.apache.org/jira/browse/NIFI-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369453#comment-14369453
]
ASF subversion and git services commented on NIFI-399:
------------------------------------------------------
Commit ad18853b589d80331e2f4574bce35d79bce09c28 in incubator-nifi's branch
refs/heads/develop from [~joewitt]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-nifi.git;h=ad18853 ]
NIFI-399 initial port
> Rename EvaluateRegularExpression to ExtractText and optimize
> ------------------------------------------------------------
>
> Key: NIFI-399
> URL: https://issues.apache.org/jira/browse/NIFI-399
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Joseph Witt
> Assignee: Joseph Witt
> Labels: deprecation
> Fix For: 0.1.0
>
> Attachments: NIFI-399.patch
>
>
> The processor EvaluateRegularExpression enables some cool extraction of text
> from data. It currently limits matching results to a single matching result.
> It should be updated to allow multiple capture groups per matching term. It
> can keep the current behavior. But can also add inclusion of all matching
> groups 0..n as an index appended to the basename of the attribute.
> In addition the name of this processor (and possibly its tags) needs to be
> updated. The processor is used to extract text from a given document. The
> name should be 'ExtractText'. We can deprecate the old processor in 0.1.0
> and in 0.2.0 pull it out.
> In addition this processor should:
> - Precompile all patterns when the processor is scheduled to run.
> - Create memory buffers that do not exceed the minimum of flow file content
> or max buffer size specified
> - Support more than 1 capturing groups. The default behavior of storing
> capture group 1 at the given name is good. But there is also benefit to
> supporting multiple capture groups in a single execution.
> - Allow the user to specify the maximum length of a capturing group value
> This also prompts the need for a StandardValidator which allows for creation
> of a validator that does a bounds check on a given DataSize.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)