[ 
https://issues.apache.org/jira/browse/NIFI-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131915#comment-16131915
 ] 

ASF GitHub Bot commented on NIFI-4227:
--------------------------------------

Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2037#discussion_r133908650
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/resources/docs/org.apache.nifi.processors.standard.ForkRecord/additionalDetails.html
 ---
    @@ -0,0 +1,365 @@
    +<!DOCTYPE html>
    +<html lang="en">
    +    <!--
    +      Licensed to the Apache Software Foundation (ASF) under one or more
    +      contributor license agreements.  See the NOTICE file distributed with
    +      this work for additional information regarding copyright ownership.
    +      The ASF licenses this file to You under the Apache License, Version 
2.0
    +      (the "License"); you may not use this file except in compliance with
    +      the License.  You may obtain a copy of the License at
    +          http://www.apache.org/licenses/LICENSE-2.0
    +      Unless required by applicable law or agreed to in writing, software
    +      distributed under the License is distributed on an "AS IS" BASIS,
    +      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
    +      See the License for the specific language governing permissions and
    +      limitations under the License.
    +    -->
    +    <head>
    +        <meta charset="utf-8" />
    +        <title>ForkRecord</title>
    +
    +        <link rel="stylesheet" 
href="../../../../../css/component-usage.css" type="text/css" />
    +    </head>
    +
    +    <body>
    +           <p>
    +                   ForkRecord allows the user to fork a record into 
multiple records. To do that, the user must specify
    +                   a <a 
href="../../../../../html/record-path-guide.html">RecordPath</a> pointing to a 
field of type 
    +                   ARRAY containing RECORD elements. The generated flow 
file will contain the records from the specified 
    +                   array. It is also possible to add in each record all 
the fields of the parent records from the root 
    +                   level to the record element being forked. However it 
supposes the fields to add are defined in the 
    +                   schema of the Record Writer controller service.
    +           </p>
    +           
    +           <h2>Examples</h2>
    +           
    +           <p>
    +                   To better understand how this Processor works, we will 
lay out a few examples. For the sake of these examples, let's assume that our 
input
    +                   data is JSON formatted and looks like this:
    +           </p>
    +
    +<code>
    +<pre>
    +[{
    +   "id": 1,
    +   "name": "John Doe",
    +   "address": "123 My Street",
    +   "city": "My City", 
    +   "state": "MS",
    +   "zipCode": "11111",
    +   "country": "USA",
    +   "accounts": [{
    +           "id": 42,
    +           "balance": 4750.89
    +   }, {
    +           "id": 43,
    +           "balance": 48212.38
    +   }]
    +}, 
    +{
    +   "id": 2,
    +   "name": "Jane Doe",
    +   "address": "345 My Street",
    +   "city": "Her City", 
    +   "state": "NY",
    +   "zipCode": "22222",
    +   "country": "USA",
    +   "accounts": [{
    +           "id": 45,
    +           "balance": 6578.45
    +   }, {
    +           "id": 46,
    +           "balance": 34567.21
    +   }]
    +}]
    +</pre>
    +</code>
    +
    +
    +           <h3>Example 1 - Fork without parent fields</h3>
    +           
    +           <p>
    +                   For this case, we want to create one record per 
<code>account</code> and we don't care about 
    +                   the other fields. We'll set the Record path property to 
<code>/accounts</code>. The resulting 
    +                   flow file will contain 4 records and will look like 
(assuming the Record Writer schema is 
    +                   correctly set):
    +           </p>
    +
    +<code>
    +<pre>
    +[{
    +   "id": 42,
    +   "balance": 4750.89
    +}, {
    +   "id": 43,
    +   "balance": 48212.38
    +}, {
    +   "id": 45,
    +   "balance": 6578.45
    +}, {
    +   "id": 46,
    +   "balance": 34567.21
    +}]
    +</pre>
    +</code>
    +
    +           
    +           <h3>Example 2 - Fork with parent fields</h3>
    +           
    +           <p>
    +                   Now, if we set the property "Include parent fields" to 
true, this will recursively include 
    --- End diff --
    
    In the use case I have that motivated this processor, being able to 
"extract" each element and "merge" it with the parent fields is really useful. 
I agree that this could be achieved with the UpdateRecord processor (and the 
addition of [NIFI-4270](https://issues.apache.org/jira/browse/NIFI-4270)) but 
it'd require one processor per type of record and can represent a lot of 
configuration (in my case, I've hundreds of types of records to "extract").
    
    What about an intermediary solution: the include parent fields property 
does not really make sense since it is up to the writing schema to define what 
should be in the output. I propose to rename this property in something like 
"Fork strategy" with two possible values:
    - Split: that would create the output you propose. It won't change the 
structure but will create 1-element array.
    - Extract: that would create the current output with the merging of the 
parent fields (assuming the output schema does contain the parent fields).
    
    What do you think?


> Create a ForkRecord processor
> -----------------------------
>
>                 Key: NIFI-4227
>                 URL: https://issues.apache.org/jira/browse/NIFI-4227
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Pierre Villard
>            Assignee: Pierre Villard
>         Attachments: TestForkRecord.xml
>
>
> I'd like a way to fork a record containing an array of records into multiple 
> records, each one being an element of the array. In addition, if configured 
> to, I'd like the option to add to each new record the parent fields.
> For example, if I've:
> {noformat}
> [{
>       "id": 1,
>       "name": "John Doe",
>       "address": "123 My Street",
>       "city": "My City", 
>       "state": "MS",
>       "zipCode": "11111",
>       "country": "USA",
>       "accounts": [{
>               "id": 42,
>               "balance": 4750.89
>       }, {
>               "id": 43,
>               "balance": 48212.38
>       }]
> }, 
> {
>       "id": 2,
>       "name": "Jane Doe",
>       "address": "345 My Street",
>       "city": "Her City", 
>       "state": "NY",
>       "zipCode": "22222",
>       "country": "USA",
>       "accounts": [{
>               "id": 45,
>               "balance": 6578.45
>       }, {
>               "id": 46,
>               "balance": 34567.21
>       }]
> }]
> {noformat}
> Then, I want to generate records looking like:
> {noformat}
> [{
>       "id": 42,
>       "balance": 4750.89
> }, {
>       "id": 43,
>       "balance": 48212.38
> }, {
>       "id": 45,
>       "balance": 6578.45
> }, {
>       "id": 46,
>       "balance": 34567.21
> }]
> {noformat}
> Or, if parent fields are included, looking like:
> {noformat}
> [{
>       "name": "John Doe",
>       "address": "123 My Street",
>       "city": "My City", 
>       "state": "MS",
>       "zipCode": "11111",
>       "country": "USA",
>       "id": 42,
>       "balance": 4750.89
> }, {
>       "name": "John Doe",
>       "address": "123 My Street",
>       "city": "My City", 
>       "state": "MS",
>       "zipCode": "11111",
>       "country": "USA",
>       "id": 43,
>       "balance": 48212.38
> }, {
>       "name": "Jane Doe",
>       "address": "345 My Street",
>       "city": "Her City", 
>       "state": "NY",
>       "zipCode": "22222",
>       "country": "USA",
>       "id": 45,
>       "balance": 6578.45
> }, {
>       "name": "Jane Doe",
>       "address": "345 My Street",
>       "city": "Her City", 
>       "state": "NY",
>       "zipCode": "22222",
>       "country": "USA",
>       "id": 46,
>       "balance": 34567.21
> }]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to