Re: Partitioning from actual Data (FlowFile) in NiFi

Joe Witt Thu, 11 May 2017 07:44:55 -0700

Cool.  Bryan offers a good approach now.  And this JIRA captures a
really powerful way to do it going forward
https://issues.apache.org/jira/browse/NIFI-3866


Thanks
Joe

On Thu, May 11, 2017 at 10:41 AM, Bryan Bende <[email protected]> wrote:
> If your data is JSON, then you could extract the date field from the
> JSON before you convert to Avro by using EvaluateJsonPath.
>
> From there lets say you have an attribute called "time" with the unix
> timestamp, you could use an UpdateAttribute processor to create
> attributes for each part of the timestamp:
>
> time.year = ${time:format("yyyy", "GMT")}
> time.month = ${time:format("MM", "GMT")}
> time.day = ${time:format("dd", "GMT")}
>
> Then in PutHDFS you can do something similar to what you were already doing:
>
> /year=${time.year}/month=${time.month}/day=${time.day}/
>
> As Joe mentioned there is a bunch of new record reader/writer related
> capabilities in 1.2.0, and there is a follow JIRA to add a "record
> path" which would allow you to extract a value (like your date field)
> from any data format.
>
> On Thu, May 11, 2017 at 10:04 AM, Anshuman Ghosh
> <[email protected]> wrote:
>> Hello Joe,
>>
>> Regret for the inconvenience, I would keep that in mind going forward!
>>
>> Thank you for your suggestion :-)
>> We have recently built NiFi from the master branch, so it should be similar
>> to 1.2.0
>> We receive data in JSON format and then convert to Avro before writing to
>> HDFS.
>> The date filed here is an Unix timestamp of 19 digit (bigint)
>>
>> It would be really great if you can help a bit on how we can achieve the
>> same with Avro here.
>> Thanking you in advance!
>>
>>
>> ______________________
>>
>> *Kind Regards,*
>> *Anshuman Ghosh*
>> *Contact - +49 179 9090964*
>>
>>
>> On Thu, May 11, 2017 at 3:53 PM, Joe Witt <[email protected]> wrote:
>>
>>> Anshuman
>>>
>>> Hello.  Please avoid directly addressing specific developers and
>>> instead just address the mailing list you need (dev or user).
>>>
>>> If your data is CSV, for example, you can use RouteText to efficiently
>>> partition the incoming sets by matching field/column values and in so
>>> doing you'll now have the flowfile attribute you need for that group.
>>> Then you can merge those together with MergeContent for like
>>> attributes and when writing to HDFS you can use that value.
>>>
>>> With the next record reader/writer capabilities in Apache NiFI 1.2.0
>>> we can now provide a record oriented PartitionRecord processor which
>>> will then also let you easily do this pattern on all kinds of
>>> formats/schemas in a nice/clean way.
>>>
>>> Joe
>>>
>>> On Thu, May 11, 2017 at 9:49 AM, Anshuman Ghosh
>>> <[email protected]> wrote:
>>> > Hello everyone,
>>> >
>>> > It would be great if you can help me implementing this use-case
>>> >
>>> > Is there any way (NiFi processor) to use an attribute (field/ column)
>>> value
>>> > for partitioning when writing the final FlowFile to HDFS/ other storage.
>>> > Earlier we were using simple system date
>>> > (/year=${now():format('yyyy')}/month=${now():format('MM')}/
>>> day=${now():format('dd')}/)
>>> > for this but that doesn't make sense when we consume old data from Kafka
>>> and
>>> > want to partition on original date (a date field inside Kafka message)
>>> >
>>> >
>>> > Thank you!
>>> > ______________________
>>> >
>>> > Kind Regards,
>>> > Anshuman Ghosh
>>> > Contact - +49 179 9090964
>>> >
>>>

Re: Partitioning from actual Data (FlowFile) in NiFi

Reply via email to