[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522554#comment-15522554
 ] 

Pierre Villard commented on NIFI-2072:
--------------------------------------

[~jfrazee] Here is a proposition:
- I add a property allowing users to enable capture group naming.
- If this property is enabled, it won't change the current behavior, but there 
will be additional attributes generated to return the corresponding capture 
groups.

Example:
Let's say the user has added the following property to the processor:
{code}
Property name = keyvalue
Property value = (\w+)=(?<value>\d+)
{code}

The data is:
{code}
a=1,b=10,c=100
{code}

The following properties will be populated (in addition to the ones already 
created):
{code}
keyvalue.value.0=1
keyvalue.value.1=10
keyvalue.value.2=100
{code}

If the repeating capture groups property is not enabled, then we'll have:
{code}
keyvalue.value=1
{code}

If the regular expression is:
{code}
Property name = keyvalue
Property value = (?<key>\w+)=(?<value>\d+)
{code}

The following properties will be populated (in addition to the ones already 
created):
{code}
keyvalue.value.0=1
keyvalue.value.1=10
keyvalue.value.2=100
keyvalue.key.0=a
keyvalue.key.1=b
keyvalue.key.2=c
{code}

If the repeating capture groups property is not enabled, then we'll have:
{code}
keyvalue.value=1
keyvalue.key=a
{code}

Does it sound like something acceptable?

> Support named captures in ExtractText
> -------------------------------------
>
>                 Key: NIFI-2072
>                 URL: https://issues.apache.org/jira/browse/NIFI-2072
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Joey Frazee
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?<name>\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to