[
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148956#comment-17148956
]
Otto Fowler commented on NIFI-2072:
-----------------------------------
[~pvillard]
Something like this? The restriction on the property to enable is: if you
want name groups, all your capturing groups MUST be named. You can't mix named
and unnamed captures.
{code:java}
final String SAMPLE_STRING =
"foo\r\nbar1\r\nbar2\r\nbar3\r\nhello\r\nworld\r\n";
@Test
public void testProcessorWithGroupNames() throws Exception {
final TestRunner testRunner = TestRunners.newTestRunner(new
ExtractText());
testRunner.setProperty("regex.result1", "(?s)(?<all>.*)");
testRunner.setProperty("regex.result2", "(?s).*(?<bar1>bar1).*");
testRunner.setProperty("regex.result3", "(?s).*?(?<bar1>bar\\d).*");
testRunner.setProperty("regex.result4",
"(?s).*?(?:bar\\d).*?(?<bar2>bar\\d).*?(?<bar3>bar3).*");
testRunner.setProperty("regex.result5", "(?s).*(?<bar3>bar\\d).*");
testRunner.setProperty("regex.result6", "(?s)^(?<all>.*)$");
testRunner.setProperty("regex.result7", "(?s)(?<miss>XXX)");
testRunner.setProperty(ENABLE_NAMED_GROUPS, "true");
testRunner.enqueue(SAMPLE_STRING.getBytes("UTF-8"));
testRunner.run();
testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1);
final MockFlowFile out =
testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0);
java.util.Map<String,String> attributes = out.getAttributes();
out.assertAttributeEquals("regex.result1.all", SAMPLE_STRING);
out.assertAttributeEquals("regex.result2.bar1", "bar1");
out.assertAttributeEquals("regex.result3.bar1", "bar1");
out.assertAttributeEquals("regex.result4.bar2", "bar2");
out.assertAttributeEquals("regex.result4.bar2", "bar2");
out.assertAttributeEquals("regex.result4.bar3", "bar3");
out.assertAttributeEquals("regex.result5.bar3", "bar3");
out.assertAttributeEquals("regex.result6.all", SAMPLE_STRING);
out.assertAttributeEquals("regex.result7.miss", null);
}
{code}
> Support named captures in ExtractText
> -------------------------------------
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Joey Frazee
> Assignee: Otto Fowler
> Priority: Major
>
> ExtractText currently captures and creates attributes using numeric indices
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture
> groups are named, i.e., patterns like (?<name>\w+).
> In addition to being more faithful to the provided regexes, named captures
> could help simplify data flows because you wouldn't have to add superfluous
> UpdateAttribute steps which are just renaming the indexed captures to more
> interpretable names.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)