Github user ijokarumawak commented on the issue:

    https://github.com/apache/nifi/pull/1050
  
    Thanks @pvillard31 this enhancement would be useful!
    I reviewed the change and played with unit tests, and found 3 comments I'd 
like to share.
    
    ### 1. Change of default behavior
    
    I'm worrying about the effect of existing data-flows. Since there's no 
guarantee that nobody has taken advantage of the original behavior 
intentionally, I would prefer to add a new Processor property to enable this 
feature, such as 'Enable Repeating Capture Groups: true/false', in order to 
keep current configuation intact.
    
    ### 2. Processor documentation
    
    The commit doesn't update the processor description, but there's a sentence 
which goes
    
    > If the Regular Expression matches more than once, only the first match 
will be used.
    
    This should be updated at least.
    
    ### 3. Test case to clarify behavior
    
    I was wondering what if multiple capturing groups are specified, and that 
regex can be repeated.  Are you interested in adding following test-case? 
Perhaps, additional documentation on how the repeated capture groups are stored 
with indexed attribute names would be helpful, too.
    
    ```Java
        @Test
        public void testFindAllPair() throws Exception {
            final TestRunner testRunner = TestRunners.newTestRunner(new 
ExtractText());
            final String attributeKey = "regex.result";
            testRunner.setProperty(attributeKey, "(\\w+)=(\\d+)");
            testRunner.enqueue("a=1,b=10,c=100".getBytes("UTF-8"));
            testRunner.run();
            testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1);
            final MockFlowFile out = 
testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0);
            // Ensure the zero capture group is in the resultant attributes
            out.assertAttributeExists(attributeKey + ".0");
            out.assertAttributeExists(attributeKey + ".1");
            out.assertAttributeExists(attributeKey + ".2");
            out.assertAttributeExists(attributeKey + ".3");
            out.assertAttributeExists(attributeKey + ".4");
            out.assertAttributeExists(attributeKey + ".5");
            out.assertAttributeExists(attributeKey + ".6");
            out.assertAttributeNotExists(attributeKey + ".7"); // Ensure 
there's no more attributes
            out.assertAttributeEquals(attributeKey, "a");
            out.assertAttributeEquals(attributeKey + ".0", "a=1");
            out.assertAttributeEquals(attributeKey + ".1", "a");
            out.assertAttributeEquals(attributeKey + ".2", "1");
            out.assertAttributeEquals(attributeKey + ".3", "b");
            out.assertAttributeEquals(attributeKey + ".4", "10");
            out.assertAttributeEquals(attributeKey + ".5", "c");
            out.assertAttributeEquals(attributeKey + ".6", "100");
        }
    ```
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to