[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148956#comment-17148956
 ] 

Otto Fowler commented on NIFI-2072:
-----------------------------------

[~pvillard]

Something like this?  The restriction on the property to enable is:  if you 
want name groups, all your capturing groups MUST be named.  You can't mix named 
and unnamed captures.


{code:java}
    final String SAMPLE_STRING = 
"foo\r\nbar1\r\nbar2\r\nbar3\r\nhello\r\nworld\r\n";

 @Test
    public void testProcessorWithGroupNames() throws Exception {

        final TestRunner testRunner = TestRunners.newTestRunner(new 
ExtractText());

        testRunner.setProperty("regex.result1", "(?s)(?<all>.*)");
        testRunner.setProperty("regex.result2", "(?s).*(?<bar1>bar1).*");
        testRunner.setProperty("regex.result3", "(?s).*?(?<bar1>bar\\d).*"); 
        testRunner.setProperty("regex.result4", 
"(?s).*?(?:bar\\d).*?(?<bar2>bar\\d).*?(?<bar3>bar3).*"); 
        testRunner.setProperty("regex.result5", "(?s).*(?<bar3>bar\\d).*"); 
        testRunner.setProperty("regex.result6", "(?s)^(?<all>.*)$");
        testRunner.setProperty("regex.result7", "(?s)(?<miss>XXX)");
        testRunner.setProperty(ENABLE_NAMED_GROUPS, "true");
        testRunner.enqueue(SAMPLE_STRING.getBytes("UTF-8"));
        testRunner.run();

        testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1);
        final MockFlowFile out = 
testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0);
        java.util.Map<String,String> attributes = out.getAttributes();
        out.assertAttributeEquals("regex.result1.all", SAMPLE_STRING);
        out.assertAttributeEquals("regex.result2.bar1", "bar1");
        out.assertAttributeEquals("regex.result3.bar1", "bar1");
        out.assertAttributeEquals("regex.result4.bar2", "bar2");
        out.assertAttributeEquals("regex.result4.bar2", "bar2");
        out.assertAttributeEquals("regex.result4.bar3", "bar3");
        out.assertAttributeEquals("regex.result5.bar3", "bar3");
        out.assertAttributeEquals("regex.result6.all", SAMPLE_STRING);
        out.assertAttributeEquals("regex.result7.miss", null);
    }
{code}


> Support named captures in ExtractText
> -------------------------------------
>
>                 Key: NIFI-2072
>                 URL: https://issues.apache.org/jira/browse/NIFI-2072
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Joey Frazee
>            Assignee: Otto Fowler
>            Priority: Major
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?<name>\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to