[jira] [Commented] (NIFI-4227) Create a ForkRecord processor

ASF GitHub Bot (JIRA) Thu, 17 Aug 2017 09:52:17 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130808#comment-16130808
 ]


ASF GitHub Bot commented on NIFI-4227:
--------------------------------------

Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2037#discussion_r133768739
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/resources/docs/org.apache.nifi.processors.standard.ForkRecord/additionalDetails.html
 ---
    @@ -0,0 +1,365 @@
    +<!DOCTYPE html>
    +<html lang="en">
    +    <!--
    +      Licensed to the Apache Software Foundation (ASF) under one or more
    +      contributor license agreements.  See the NOTICE file distributed with
    +      this work for additional information regarding copyright ownership.
    +      The ASF licenses this file to You under the Apache License, Version 
2.0
    +      (the "License"); you may not use this file except in compliance with
    +      the License.  You may obtain a copy of the License at
    +          http://www.apache.org/licenses/LICENSE-2.0
    +      Unless required by applicable law or agreed to in writing, software
    +      distributed under the License is distributed on an "AS IS" BASIS,
    +      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
    +      See the License for the specific language governing permissions and
    +      limitations under the License.
    +    -->
    +    <head>
    +        <meta charset="utf-8" />
    +        <title>ForkRecord</title>
    +
    +        <link rel="stylesheet" 
href="../../../../../css/component-usage.css" type="text/css" />
    +    </head>
    +
    +    <body>
    +           <p>
    +                   ForkRecord allows the user to fork a record into 
multiple records. To do that, the user must specify
    +                   a <a 
href="../../../../../html/record-path-guide.html">RecordPath</a> pointing to a 
field of type 
    +                   ARRAY containing RECORD elements. The generated flow 
file will contain the records from the specified 
    +                   array. It is also possible to add in each record all 
the fields of the parent records from the root 
    +                   level to the record element being forked. However it 
supposes the fields to add are defined in the 
    +                   schema of the Record Writer controller service.
    +           </p>
    +           
    +           <h2>Examples</h2>
    +           
    +           <p>
    +                   To better understand how this Processor works, we will 
lay out a few examples. For the sake of these examples, let's assume that our 
input
    +                   data is JSON formatted and looks like this:
    +           </p>
    +
    +<code>
    +<pre>
    +[{
    +   "id": 1,
    +   "name": "John Doe",
    +   "address": "123 My Street",
    +   "city": "My City", 
    +   "state": "MS",
    +   "zipCode": "11111",
    +   "country": "USA",
    +   "accounts": [{
    +           "id": 42,
    +           "balance": 4750.89
    +   }, {
    +           "id": 43,
    +           "balance": 48212.38
    +   }]
    +}, 
    +{
    +   "id": 2,
    +   "name": "Jane Doe",
    +   "address": "345 My Street",
    +   "city": "Her City", 
    +   "state": "NY",
    +   "zipCode": "22222",
    +   "country": "USA",
    +   "accounts": [{
    +           "id": 45,
    +           "balance": 6578.45
    +   }, {
    +           "id": 46,
    +           "balance": 34567.21
    +   }]
    +}]
    +</pre>
    +</code>
    +
    +
    +           <h3>Example 1 - Fork without parent fields</h3>
    +           
    +           <p>
    +                   For this case, we want to create one record per 
<code>account</code> and we don't care about 
    +                   the other fields. We'll set the Record path property to 
<code>/accounts</code>. The resulting 
    +                   flow file will contain 4 records and will look like 
(assuming the Record Writer schema is 
    +                   correctly set):
    +           </p>
    +
    +<code>
    +<pre>
    +[{
    +   "id": 42,
    +   "balance": 4750.89
    +}, {
    +   "id": 43,
    +   "balance": 48212.38
    +}, {
    +   "id": 45,
    +   "balance": 6578.45
    +}, {
    +   "id": 46,
    +   "balance": 34567.21
    +}]
    +</pre>
    +</code>
    +
    +           
    +           <h3>Example 2 - Fork with parent fields</h3>
    +           
    +           <p>
    +                   Now, if we set the property "Include parent fields" to 
true, this will recursively include 
    --- End diff --
    
    In such a case, I would have actually expected the result to have an 
'accounts' field that is a 1-element array. 
    If we wanted to promote that up to the top level, an UpdateRecord processor 
could be used. 
    So in general, the way that i would expect the processor to work is to 
create a copy of the Record that 
    contains the exact same structure as the original except each element in 
the array denoted by the 
    RecordPath would have a single element in the result. For example, if the 
input looked like:
    
    ```
    {
        "id": 1,
        "members": [{
                "id": 42,
                    "name": "John Doe",
                    "accounts": [
                        {
                           "id": 382,
                           "name": "first account",
                           "balance": 17.82
                        }, {
                           "id": 482,
                           "name": "other account",
                           "balance": 182.34
                        }
                    ]
        }, {
                "id": 43,
                    "name": "Jane Doe",
                    "accounts": [
                        {
                           "id": 492,
                           "name": "yet another account",
                           "balance": 21.12
                        }, {
                           "id": 513,
                           "name": "final account",
                           "balance": 142.22
                        }
                    ]
        }]
     }
    ```
    
    Then, if we set the Record Path to `/members`, I would expect output of:
    
    ```
    [{
        "id": 1,
        "members": [{
                  "id": 42,
          "name": "John Doe",
          "accounts": [
              {
                 "id": 382,
                 "name": "first account",
                 "balance": 17.82
              }, {
                 "id": 482,
                 "name": "other account",
                 "balance": 182.34
              }
              ]
        }]
    }, {
      "id": 1,
      "members": [{
        "id": 43,
        "name": "Jane Doe",
        "accounts": [
            {
               "id": 492,
               "name": "yet another account",
               "balance": 21.12
            }, {
               "id": 513,
               "name": "final account",
               "balance": 142.22
            }
        ]
      }]
    }]
    ```
    
    But if I set the Record Path to `/members[*]/accounts`, then I would expect
    4 records to be output, each having a single element in the 'members' array
    and each having a single element in the 'accounts' sub-array.
    
    If we took this approach, I think it would make the processor a lot more 
powerful
    because an UpdateRecord processor could be used next if necessary to pair 
down
    the records to the desired fields. This gives us a lot of flexibility with 
just 1
    or 2 processors.
    
    Thoughts?


> Create a ForkRecord processor
> -----------------------------
>
>                 Key: NIFI-4227
>                 URL: https://issues.apache.org/jira/browse/NIFI-4227
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Pierre Villard
>            Assignee: Pierre Villard
>         Attachments: TestForkRecord.xml
>
>
> I'd like a way to fork a record containing an array of records into multiple 
> records, each one being an element of the array. In addition, if configured 
> to, I'd like the option to add to each new record the parent fields.
> For example, if I've:
> {noformat}
> [{
>       "id": 1,
>       "name": "John Doe",
>       "address": "123 My Street",
>       "city": "My City", 
>       "state": "MS",
>       "zipCode": "11111",
>       "country": "USA",
>       "accounts": [{
>               "id": 42,
>               "balance": 4750.89
>       }, {
>               "id": 43,
>               "balance": 48212.38
>       }]
> }, 
> {
>       "id": 2,
>       "name": "Jane Doe",
>       "address": "345 My Street",
>       "city": "Her City", 
>       "state": "NY",
>       "zipCode": "22222",
>       "country": "USA",
>       "accounts": [{
>               "id": 45,
>               "balance": 6578.45
>       }, {
>               "id": 46,
>               "balance": 34567.21
>       }]
> }]
> {noformat}
> Then, I want to generate records looking like:
> {noformat}
> [{
>       "id": 42,
>       "balance": 4750.89
> }, {
>       "id": 43,
>       "balance": 48212.38
> }, {
>       "id": 45,
>       "balance": 6578.45
> }, {
>       "id": 46,
>       "balance": 34567.21
> }]
> {noformat}
> Or, if parent fields are included, looking like:
> {noformat}
> [{
>       "name": "John Doe",
>       "address": "123 My Street",
>       "city": "My City", 
>       "state": "MS",
>       "zipCode": "11111",
>       "country": "USA",
>       "id": 42,
>       "balance": 4750.89
> }, {
>       "name": "John Doe",
>       "address": "123 My Street",
>       "city": "My City", 
>       "state": "MS",
>       "zipCode": "11111",
>       "country": "USA",
>       "id": 43,
>       "balance": 48212.38
> }, {
>       "name": "Jane Doe",
>       "address": "345 My Street",
>       "city": "Her City", 
>       "state": "NY",
>       "zipCode": "22222",
>       "country": "USA",
>       "id": 45,
>       "balance": 6578.45
> }, {
>       "name": "Jane Doe",
>       "address": "345 My Street",
>       "city": "Her City", 
>       "state": "NY",
>       "zipCode": "22222",
>       "country": "USA",
>       "id": 46,
>       "balance": 34567.21
> }]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-4227) Create a ForkRecord processor

Reply via email to