[
https://issues.apache.org/jira/browse/NIFI-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368879#comment-14368879
]
Matt Gilman commented on NIFI-399:
----------------------------------
+1 The only thing I might say is that the description of ExtractText was
confusing at first because it said the result of the capturing group would be
'placed into that attribute name'. On first reading I thought it was saying
that the attribute names would be coming from the flowfile content.
In EvaluateRegularExpression you've removed the tags, marked it deprecated, and
added to the description. Would removing the existing description be too much?
> Rename EvaluateRegularExpression to ExtractText and optimize
> ------------------------------------------------------------
>
> Key: NIFI-399
> URL: https://issues.apache.org/jira/browse/NIFI-399
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Joseph Witt
> Assignee: Joseph Witt
> Labels: deprecation
> Fix For: 0.1.0
>
> Attachments: NIFI-399.patch
>
>
> The processor EvaluateRegularExpression enables some cool extraction of text
> from data. It currently limits matching results to a single matching result.
> It should be updated to allow multiple capture groups per matching term. It
> can keep the current behavior. But can also add inclusion of all matching
> groups 0..n as an index appended to the basename of the attribute.
> In addition the name of this processor (and possibly its tags) needs to be
> updated. The processor is used to extract text from a given document. The
> name should be 'ExtractText'. We can deprecate the old processor in 0.1.0
> and in 0.2.0 pull it out.
> In addition this processor should:
> - Precompile all patterns when the processor is scheduled to run.
> - Create memory buffers that do not exceed the minimum of flow file content
> or max buffer size specified
> - Support more than 1 capturing groups. The default behavior of storing
> capture group 1 at the given name is good. But there is also benefit to
> supporting multiple capture groups in a single execution.
> - Allow the user to specify the maximum length of a capturing group value
> This also prompts the need for a StandardValidator which allows for creation
> of a validator that does a bounds check on a given DataSize.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)