[ 
https://issues.apache.org/jira/browse/NIFI-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15515799#comment-15515799
 ] 

ASF GitHub Bot commented on NIFI-2071:
--------------------------------------

Github user ijokarumawak commented on the issue:

    https://github.com/apache/nifi/pull/1050
  
    Thanks @pvillard31 this enhancement would be useful!
    I reviewed the change and played with unit tests, and found 3 comments I'd 
like to share.
    
    ### 1. Change of default behavior
    
    I'm worrying about the effect of existing data-flows. Since there's no 
guarantee that nobody has taken advantage of the original behavior 
intentionally, I would prefer to add a new Processor property to enable this 
feature, such as 'Enable Repeating Capture Groups: true/false', in order to 
keep current configuation intact.
    
    ### 2. Processor documentation
    
    The commit doesn't update the processor description, but there's a sentence 
which goes
    
    > If the Regular Expression matches more than once, only the first match 
will be used.
    
    This should be updated at least.
    
    ### 3. Test case to clarify behavior
    
    I was wondering what if multiple capturing groups are specified, and that 
regex can be repeated.  Are you interested in adding following test-case? 
Perhaps, additional documentation on how the repeated capture groups are stored 
with indexed attribute names would be helpful, too.
    
    ```Java
        @Test
        public void testFindAllPair() throws Exception {
            final TestRunner testRunner = TestRunners.newTestRunner(new 
ExtractText());
            final String attributeKey = "regex.result";
            testRunner.setProperty(attributeKey, "(\\w+)=(\\d+)");
            testRunner.enqueue("a=1,b=10,c=100".getBytes("UTF-8"));
            testRunner.run();
            testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1);
            final MockFlowFile out = 
testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0);
            // Ensure the zero capture group is in the resultant attributes
            out.assertAttributeExists(attributeKey + ".0");
            out.assertAttributeExists(attributeKey + ".1");
            out.assertAttributeExists(attributeKey + ".2");
            out.assertAttributeExists(attributeKey + ".3");
            out.assertAttributeExists(attributeKey + ".4");
            out.assertAttributeExists(attributeKey + ".5");
            out.assertAttributeExists(attributeKey + ".6");
            out.assertAttributeNotExists(attributeKey + ".7"); // Ensure 
there's no more attributes
            out.assertAttributeEquals(attributeKey, "a");
            out.assertAttributeEquals(attributeKey + ".0", "a=1");
            out.assertAttributeEquals(attributeKey + ".1", "a");
            out.assertAttributeEquals(attributeKey + ".2", "1");
            out.assertAttributeEquals(attributeKey + ".3", "b");
            out.assertAttributeEquals(attributeKey + ".4", "10");
            out.assertAttributeEquals(attributeKey + ".5", "c");
            out.assertAttributeEquals(attributeKey + ".6", "100");
        }
    ```
    



> Support repeating capture groups in ExtractText
> -----------------------------------------------
>
>                 Key: NIFI-2071
>                 URL: https://issues.apache.org/jira/browse/NIFI-2071
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Joey Frazee
>            Assignee: Pierre Villard
>             Fix For: 1.1.0
>
>
> ExtractText doesn't currently support repeating capture groups so any 
> repeating patterns have to specified by hand and can only be repeated a fixed 
> number of times.
> I think this is because it only uses find() and not findAll() in its pattern 
> matching [1].
> 1. 
> https://github.com/apache/nifi/blob/1bd2cf0d09a7111bcecffd0f473aa71c25a69845/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ExtractText.java#L324



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to