Srini, I thought about it a little bit more and I think I have a temporary solution that will actually work for you. I still recommend you open the Jira but the following regex should work for you:
^.*(.??)$
I’ll break down the regex:
^ - Match at the start of the content
.* - Match any character any number of times
(.??) - Capture group to match any character 0 or 1 times, greedy (i.e. will
prefer 0 over 1)
$ - Match the end of the content
This results in the following LogAttribute output:
--------------------------------------------------
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Mon Mar 13 17:38:03 PDT 2017'
Key: 'lineageStartDate'
Value: 'Mon Mar 13 17:38:03 PDT 2017'
Key: 'fileSize'
Value: '29'
FlowFile Attribute Map Content
Key: 'entire_match.0'
Value: 'This is a plaintext message. '
Key: 'filename'
Value: '1343455595942828'
Key: 'path'
Value: './'
Key: 'uuid'
Value: '9382e5f0-782d-4c71-963f-1004c2a50275'
--------------------------------------------------
Now your expression passes validation (because it has 1 explicit capture
group), but won’t waste space on duplicate attributes. You just have to
reference “attribute.0” instead of “attribute” in your follow-on processors (or
use UpdateAttribute to copy and delete the original attribute, but this also
wastes space).
Hope this helps until we can provide the improved UX.
Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
> On Mar 13, 2017, at 5:00 PM, Andy LoPresto <[email protected]> wrote:
>
> Here is the specific source code for reference:
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ExtractText.java#L262-L262
>
> <https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ExtractText.java#L262-L262>
>
> Andy LoPresto
> [email protected] <mailto:[email protected]>
> [email protected] <mailto:[email protected]>
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
>
>> On Mar 13, 2017, at 4:56 PM, Andy LoPresto <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Yes, I evaluated locally and apparently the ExtractText regex validation
>> requires “1 to 40 capturing groups”. You can set “Include Capture Group 0”
>> to false to reduce the duplication of the captured attribute (you’ll go from
>> 3*n to 2*n). I am unaware of a technical reason the provided regex is
>> required to have at least one capture group. I would recommend you open a
>> Jira to reduce the minimum capture group count to 0 during validation if
>> “Include Capture Group 0” is set to true.
>>
>> <Screen Shot 2017-03-13 at 4.54.49 PM.png><Screen Shot 2017-03-13 at 4.55.25
>> PM.png>
>>
>>
>> Andy LoPresto
>> [email protected] <mailto:[email protected]>
>> [email protected] <mailto:[email protected]>
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
>>
>>> On Mar 13, 2017, at 3:25 PM, srini <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> Hi Any,
>>> I dropped the idea of saving the flowfile to an attribute. So I am good in
>>> that part.
>>>
>>> And you said "An immediate fix is to remove the parentheses from your regex;
>>> .*"
>>> But It is not taking if I remove parentheses.
>>>
>>> thanks
>>> Srini
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-nifi-developer-list.39713.n7.nabble.com/I-have-attribute-called-X-But-X-0-and-X-1-also-got-created-Why-tp15062p15114.html
>>>
>>> <http://apache-nifi-developer-list.39713.n7.nabble.com/I-have-attribute-called-X-But-X-0-and-X-1-also-got-created-Why-tp15062p15114.html>
>>> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com
>>> <http://nabble.com/>.
>>
>
signature.asc
Description: Message signed with OpenPGP using GPGMail
