[
https://issues.apache.org/jira/browse/NIFI-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131915#comment-16131915
]
ASF GitHub Bot commented on NIFI-4227:
--------------------------------------
Github user pvillard31 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2037#discussion_r133908650
--- Diff:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/resources/docs/org.apache.nifi.processors.standard.ForkRecord/additionalDetails.html
---
@@ -0,0 +1,365 @@
+<!DOCTYPE html>
+<html lang="en">
+ <!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version
2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ -->
+ <head>
+ <meta charset="utf-8" />
+ <title>ForkRecord</title>
+
+ <link rel="stylesheet"
href="../../../../../css/component-usage.css" type="text/css" />
+ </head>
+
+ <body>
+ <p>
+ ForkRecord allows the user to fork a record into
multiple records. To do that, the user must specify
+ a <a
href="../../../../../html/record-path-guide.html">RecordPath</a> pointing to a
field of type
+ ARRAY containing RECORD elements. The generated flow
file will contain the records from the specified
+ array. It is also possible to add in each record all
the fields of the parent records from the root
+ level to the record element being forked. However it
supposes the fields to add are defined in the
+ schema of the Record Writer controller service.
+ </p>
+
+ <h2>Examples</h2>
+
+ <p>
+ To better understand how this Processor works, we will
lay out a few examples. For the sake of these examples, let's assume that our
input
+ data is JSON formatted and looks like this:
+ </p>
+
+<code>
+<pre>
+[{
+ "id": 1,
+ "name": "John Doe",
+ "address": "123 My Street",
+ "city": "My City",
+ "state": "MS",
+ "zipCode": "11111",
+ "country": "USA",
+ "accounts": [{
+ "id": 42,
+ "balance": 4750.89
+ }, {
+ "id": 43,
+ "balance": 48212.38
+ }]
+},
+{
+ "id": 2,
+ "name": "Jane Doe",
+ "address": "345 My Street",
+ "city": "Her City",
+ "state": "NY",
+ "zipCode": "22222",
+ "country": "USA",
+ "accounts": [{
+ "id": 45,
+ "balance": 6578.45
+ }, {
+ "id": 46,
+ "balance": 34567.21
+ }]
+}]
+</pre>
+</code>
+
+
+ <h3>Example 1 - Fork without parent fields</h3>
+
+ <p>
+ For this case, we want to create one record per
<code>account</code> and we don't care about
+ the other fields. We'll set the Record path property to
<code>/accounts</code>. The resulting
+ flow file will contain 4 records and will look like
(assuming the Record Writer schema is
+ correctly set):
+ </p>
+
+<code>
+<pre>
+[{
+ "id": 42,
+ "balance": 4750.89
+}, {
+ "id": 43,
+ "balance": 48212.38
+}, {
+ "id": 45,
+ "balance": 6578.45
+}, {
+ "id": 46,
+ "balance": 34567.21
+}]
+</pre>
+</code>
+
+
+ <h3>Example 2 - Fork with parent fields</h3>
+
+ <p>
+ Now, if we set the property "Include parent fields" to
true, this will recursively include
--- End diff --
In the use case I have that motivated this processor, being able to
"extract" each element and "merge" it with the parent fields is really useful.
I agree that this could be achieved with the UpdateRecord processor (and the
addition of [NIFI-4270](https://issues.apache.org/jira/browse/NIFI-4270)) but
it'd require one processor per type of record and can represent a lot of
configuration (in my case, I've hundreds of types of records to "extract").
What about an intermediary solution: the include parent fields property
does not really make sense since it is up to the writing schema to define what
should be in the output. I propose to rename this property in something like
"Fork strategy" with two possible values:
- Split: that would create the output you propose. It won't change the
structure but will create 1-element array.
- Extract: that would create the current output with the merging of the
parent fields (assuming the output schema does contain the parent fields).
What do you think?
> Create a ForkRecord processor
> -----------------------------
>
> Key: NIFI-4227
> URL: https://issues.apache.org/jira/browse/NIFI-4227
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Pierre Villard
> Assignee: Pierre Villard
> Attachments: TestForkRecord.xml
>
>
> I'd like a way to fork a record containing an array of records into multiple
> records, each one being an element of the array. In addition, if configured
> to, I'd like the option to add to each new record the parent fields.
> For example, if I've:
> {noformat}
> [{
> "id": 1,
> "name": "John Doe",
> "address": "123 My Street",
> "city": "My City",
> "state": "MS",
> "zipCode": "11111",
> "country": "USA",
> "accounts": [{
> "id": 42,
> "balance": 4750.89
> }, {
> "id": 43,
> "balance": 48212.38
> }]
> },
> {
> "id": 2,
> "name": "Jane Doe",
> "address": "345 My Street",
> "city": "Her City",
> "state": "NY",
> "zipCode": "22222",
> "country": "USA",
> "accounts": [{
> "id": 45,
> "balance": 6578.45
> }, {
> "id": 46,
> "balance": 34567.21
> }]
> }]
> {noformat}
> Then, I want to generate records looking like:
> {noformat}
> [{
> "id": 42,
> "balance": 4750.89
> }, {
> "id": 43,
> "balance": 48212.38
> }, {
> "id": 45,
> "balance": 6578.45
> }, {
> "id": 46,
> "balance": 34567.21
> }]
> {noformat}
> Or, if parent fields are included, looking like:
> {noformat}
> [{
> "name": "John Doe",
> "address": "123 My Street",
> "city": "My City",
> "state": "MS",
> "zipCode": "11111",
> "country": "USA",
> "id": 42,
> "balance": 4750.89
> }, {
> "name": "John Doe",
> "address": "123 My Street",
> "city": "My City",
> "state": "MS",
> "zipCode": "11111",
> "country": "USA",
> "id": 43,
> "balance": 48212.38
> }, {
> "name": "Jane Doe",
> "address": "345 My Street",
> "city": "Her City",
> "state": "NY",
> "zipCode": "22222",
> "country": "USA",
> "id": 45,
> "balance": 6578.45
> }, {
> "name": "Jane Doe",
> "address": "345 My Street",
> "city": "Her City",
> "state": "NY",
> "zipCode": "22222",
> "country": "USA",
> "id": 46,
> "balance": 34567.21
> }]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)