[
https://issues.apache.org/jira/browse/NIFI-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15515799#comment-15515799
]
ASF GitHub Bot commented on NIFI-2071:
--------------------------------------
Github user ijokarumawak commented on the issue:
https://github.com/apache/nifi/pull/1050
Thanks @pvillard31 this enhancement would be useful!
I reviewed the change and played with unit tests, and found 3 comments I'd
like to share.
### 1. Change of default behavior
I'm worrying about the effect of existing data-flows. Since there's no
guarantee that nobody has taken advantage of the original behavior
intentionally, I would prefer to add a new Processor property to enable this
feature, such as 'Enable Repeating Capture Groups: true/false', in order to
keep current configuation intact.
### 2. Processor documentation
The commit doesn't update the processor description, but there's a sentence
which goes
> If the Regular Expression matches more than once, only the first match
will be used.
This should be updated at least.
### 3. Test case to clarify behavior
I was wondering what if multiple capturing groups are specified, and that
regex can be repeated. Are you interested in adding following test-case?
Perhaps, additional documentation on how the repeated capture groups are stored
with indexed attribute names would be helpful, too.
```Java
@Test
public void testFindAllPair() throws Exception {
final TestRunner testRunner = TestRunners.newTestRunner(new
ExtractText());
final String attributeKey = "regex.result";
testRunner.setProperty(attributeKey, "(\\w+)=(\\d+)");
testRunner.enqueue("a=1,b=10,c=100".getBytes("UTF-8"));
testRunner.run();
testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1);
final MockFlowFile out =
testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0);
// Ensure the zero capture group is in the resultant attributes
out.assertAttributeExists(attributeKey + ".0");
out.assertAttributeExists(attributeKey + ".1");
out.assertAttributeExists(attributeKey + ".2");
out.assertAttributeExists(attributeKey + ".3");
out.assertAttributeExists(attributeKey + ".4");
out.assertAttributeExists(attributeKey + ".5");
out.assertAttributeExists(attributeKey + ".6");
out.assertAttributeNotExists(attributeKey + ".7"); // Ensure
there's no more attributes
out.assertAttributeEquals(attributeKey, "a");
out.assertAttributeEquals(attributeKey + ".0", "a=1");
out.assertAttributeEquals(attributeKey + ".1", "a");
out.assertAttributeEquals(attributeKey + ".2", "1");
out.assertAttributeEquals(attributeKey + ".3", "b");
out.assertAttributeEquals(attributeKey + ".4", "10");
out.assertAttributeEquals(attributeKey + ".5", "c");
out.assertAttributeEquals(attributeKey + ".6", "100");
}
```
> Support repeating capture groups in ExtractText
> -----------------------------------------------
>
> Key: NIFI-2071
> URL: https://issues.apache.org/jira/browse/NIFI-2071
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Joey Frazee
> Assignee: Pierre Villard
> Fix For: 1.1.0
>
>
> ExtractText doesn't currently support repeating capture groups so any
> repeating patterns have to specified by hand and can only be repeated a fixed
> number of times.
> I think this is because it only uses find() and not findAll() in its pattern
> matching [1].
> 1.
> https://github.com/apache/nifi/blob/1bd2cf0d09a7111bcecffd0f473aa71c25a69845/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ExtractText.java#L324
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)