Re: FlattenJson

Matt Burgess Tue, 20 Mar 2018 13:06:42 -0700

Rather than restricting it to JSONPath, perhaps we should have a
RouteOnRecordPath or RouteRecord using the RecordPath API? Even better
would be the ability to use RecordPath functions in QueryRecord, but
that involves digging into Calcite as well.  I realize JSONPath might
have more capabilities than RecordPath at the moment, but it seems a
shame to force the user to convert to JSON to use a "RouteOnJSONPath"
processor, the record-aware processors are meant to replace that kind
of format-specific functionality.


Regards,
Matt

On Tue, Mar 20, 2018 at 12:19 PM, Sivaprasanna
<[email protected]> wrote:
> Like the idea that Otto suggested. RoutOnJSONPath makes more sense since
> making the flattened JSON write to attributes is restricted to that
> processor alone.
>
> On Tue, Mar 20, 2018 at 8:37 PM, Otto Fowler <[email protected]>
> wrote:
>
>> Why not create a new processor that does routeOnJSONPath and works on the
>> flow file?
>>
>>
>> On March 20, 2018 at 10:39:37, Jorge Machado ([email protected]) wrote:
>>
>> So that is what we actually are doing EvaluateJsonPath the problem with
>> that is, that is hard to build something generic if we need to specify each
>> property by his name, that’s why this idea.
>>
>> Should I make a PR for this or is this to business specific ?
>>
>>
>> Jorge Machado
>>
>> > On 20 Mar 2018, at 15:30, Bryan Bende <[email protected]> wrote:
>> >
>> > Ok so I guess it depends whether you end up needing all 30 fields as
>> > attributes to achieve the logic in your flow, or if you only need a
>> > couple.
>> >
>> > If you only need a couple you could probably use EvaluateJsonPath
>> > after FlattenJson to extract just the couple of fields you need into
>> > attributes.
>> >
>> > If you need them all then I guess it makes sense to want the option to
>> > flatten into attributes.
>> >
>> > On Tue, Mar 20, 2018 at 10:14 AM, Jorge Machado <[email protected]> wrote:
>> >> From there on we use a lot of routeOnAttritutes and use that values on
>> sql queries to other tables like select * from someTable where
>> id=${myExtractedAttribute}
>> >> To be honest I tryed JoltTransformJSON but I could not get it working :)
>> >>
>> >> Jorge Machado
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>> On 20 Mar 2018, at 15:12, Matt Burgess <[email protected]> wrote:
>> >>>
>> >>> I think Bryan is asking about what happens AFTER this part of the
>> >>> flow. For example, if you are doing routing you can use QueryRecord
>> >>> (and you won't need the SplitJson), if you are doing transformations
>> >>> you can use JoltTransformJSON (often without SplitJson as well), etc.
>> >>>
>> >>> Regards,
>> >>> Matt
>> >>>
>> >>> On Tue, Mar 20, 2018 at 10:08 AM, Jorge Machado <[email protected]> wrote:
>> >>>> Hi Bryan,
>> >>>>
>> >>>> thanks for the help.
>> >>>> Our Flow: ExecuteSql -> convertToJSON -> SplitJson -> ExecuteScript
>> with attachedcode 1.
>> >>>>
>> >>>> We are now writting a custom processor that does this which is a copy
>> of FlattenJson but instead of putting the result into a flowfile we put it
>> into the attributes.
>> >>>> That’s why I asked if it makes sense to contribute this back
>> >>>>
>> >>>>
>> >>>>
>> >>>> Attached code 1:
>> >>>>
>> >>>> import org.apache.commons.io.IOUtils
>> >>>> import java.nio.charset.*
>> >>>> def flowFile = session.get();
>> >>>> if (flowFile == null) {
>> >>>> return;
>> >>>> }
>> >>>> def slurper = new groovy.json.JsonSlurper()
>> >>>> def attrs = [:] as Map<String,String>
>> >>>> session.read(flowFile,
>> >>>> { inputStream ->
>> >>>> def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
>> >>>> def obj = slurper.parseText(text)
>> >>>> obj.each {k,v ->
>> >>>> if(v!=null && v.toString()!=""){
>> >>>> attrs[k] = v.toString()
>> >>>> }
>> >>>> }
>> >>>> } as InputStreamCallback)
>> >>>> flowFile = session.putAllAttributes(flowFile, attrs)
>> >>>> session.transfer(flowFile, REL_SUCCESS)
>> >>>>
>> >>>> some code removed
>> >>>>
>> >>>>
>> >>>> Jorge Machado
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>> On 20 Mar 2018, at 15:03, Bryan Bende <[email protected]> wrote:
>> >>>>>
>> >>>>> Ok it is still not clear what the reason for needing it in attributes
>> >>>>> is though... Is there another processor you are using after this that
>> >>>>> only works off attributes?
>> >>>>>
>> >>>>> Just trying to understand if there is another way to accomplish what
>> >>>>> you want to do.
>> >>>>>
>> >>>>> On Tue, Mar 20, 2018 at 9:50 AM, Jorge Machado <[email protected]>
>> wrote:
>> >>>>>> We are using nifi for Workflow and we get from a database like
>> job_status and job_name and some nested json columns. (30 columns)
>> >>>>>> We need to put it as attributes from the Flow file and not the
>> content. For the first part (columns without a json is done by groovy
>> script) but then would be nice to use this standard processor and instead
>> of writing this to a flow content write it to attributes.
>> >>>>>>
>> >>>>>>
>> >>>>>> Jorge Machado
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>> On 20 Mar 2018, at 14:47, Bryan Bende <[email protected]> wrote:
>> >>>>>>>
>> >>>>>>> What would be the main use case for wanting all the flattened
>> values
>> >>>>>>> in attributes?
>> >>>>>>>
>> >>>>>>> If the reason was to keep the original content, we could probably
>> just
>> >>>>>>> added an original relationship.
>> >>>>>>>
>> >>>>>>> Also, I think FlattenJson supports flattening a flow file where the
>> >>>>>>> root is an array of JSON documents (although I'm not totally sure),
>> so
>> >>>>>>> you'd have to consider what to do in that case.
>> >>>>>>>
>> >>>>>>> On Tue, Mar 20, 2018 at 5:26 AM, Pierre Villard
>> >>>>>>> <[email protected]> wrote:
>> >>>>>>>> No I do see how this could be convenient in some cases. My comment
>> was
>> >>>>>>>> more: you can certainly submit a PR for that feature, but it'll
>> need to be
>> >>>>>>>> clearly documented using the appropriate annotations,
>> documentation, and
>> >>>>>>>> property descriptions.
>> >>>>>>>>
>> >>>>>>>> 2018-03-20 10:20 GMT+01:00 Jorge Machado <[email protected]>:
>> >>>>>>>>
>> >>>>>>>>> Hi Pierre, I’m aware of that. So This means the change would not
>> be
>> >>>>>>>>> accepted correct ?
>> >>>>>>>>>
>> >>>>>>>>> Regards
>> >>>>>>>>>
>> >>>>>>>>> Jorge Machado
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>> On 20 Mar 2018, at 09:54, Pierre Villard <
>> [email protected]>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> Hi Jorge,
>> >>>>>>>>>>
>> >>>>>>>>>> I think this should be carefully documented to remind users that
>> the
>> >>>>>>>>>> attributes are in memory. Doing what you propose would mean
>> having in
>> >>>>>>>>>> memory the full content of the flow file as long as the flow
>> file is
>> >>>>>>>>>> processed in the workflow (unless you remove attributes using
>> >>>>>>>>>> UpdateAttributes).
>> >>>>>>>>>>
>> >>>>>>>>>> Pierre
>> >>>>>>>>>>
>> >>>>>>>>>> 2018-03-20 7:55 GMT+01:00 Jorge Machado <[email protected]>:
>> >>>>>>>>>>
>> >>>>>>>>>>> Hey guys,
>> >>>>>>>>>>>
>> >>>>>>>>>>> I would like to change the FlattenJson Procerssor to be
>> possible to
>> >>>>>>>>>>> Flatten to the attributes instead of Only to content. Is this a
>> good
>> >>>>>>>>> Idea ?
>> >>>>>>>>>>> would the PR be accepted ?
>> >>>>>>>>>>>
>> >>>>>>>>>>> Cheers
>> >>>>>>>>>>>
>> >>>>>>>>>>> Jorge Machado
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>
>> >>>>
>> >>
>>

Re: FlattenJson

Reply via email to