Github user ijokarumawak commented on the issue:
https://github.com/apache/nifi/pull/1050
Thanks @pvillard31 this enhancement would be useful!
I reviewed the change and played with unit tests, and found 3 comments I'd
like to share.
### 1. Change of default behavior
I'm worrying about the effect of existing data-flows. Since there's no
guarantee that nobody has taken advantage of the original behavior
intentionally, I would prefer to add a new Processor property to enable this
feature, such as 'Enable Repeating Capture Groups: true/false', in order to
keep current configuation intact.
### 2. Processor documentation
The commit doesn't update the processor description, but there's a sentence
which goes
> If the Regular Expression matches more than once, only the first match
will be used.
This should be updated at least.
### 3. Test case to clarify behavior
I was wondering what if multiple capturing groups are specified, and that
regex can be repeated. Are you interested in adding following test-case?
Perhaps, additional documentation on how the repeated capture groups are stored
with indexed attribute names would be helpful, too.
```Java
@Test
public void testFindAllPair() throws Exception {
final TestRunner testRunner = TestRunners.newTestRunner(new
ExtractText());
final String attributeKey = "regex.result";
testRunner.setProperty(attributeKey, "(\\w+)=(\\d+)");
testRunner.enqueue("a=1,b=10,c=100".getBytes("UTF-8"));
testRunner.run();
testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1);
final MockFlowFile out =
testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0);
// Ensure the zero capture group is in the resultant attributes
out.assertAttributeExists(attributeKey + ".0");
out.assertAttributeExists(attributeKey + ".1");
out.assertAttributeExists(attributeKey + ".2");
out.assertAttributeExists(attributeKey + ".3");
out.assertAttributeExists(attributeKey + ".4");
out.assertAttributeExists(attributeKey + ".5");
out.assertAttributeExists(attributeKey + ".6");
out.assertAttributeNotExists(attributeKey + ".7"); // Ensure
there's no more attributes
out.assertAttributeEquals(attributeKey, "a");
out.assertAttributeEquals(attributeKey + ".0", "a=1");
out.assertAttributeEquals(attributeKey + ".1", "a");
out.assertAttributeEquals(attributeKey + ".2", "1");
out.assertAttributeEquals(attributeKey + ".3", "b");
out.assertAttributeEquals(attributeKey + ".4", "10");
out.assertAttributeEquals(attributeKey + ".5", "c");
out.assertAttributeEquals(attributeKey + ".6", "100");
}
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---