[
https://issues.apache.org/jira/browse/NIFI-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130808#comment-16130808
]
ASF GitHub Bot commented on NIFI-4227:
--------------------------------------
Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2037#discussion_r133768739
--- Diff:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/resources/docs/org.apache.nifi.processors.standard.ForkRecord/additionalDetails.html
---
@@ -0,0 +1,365 @@
+<!DOCTYPE html>
+<html lang="en">
+ <!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version
2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ -->
+ <head>
+ <meta charset="utf-8" />
+ <title>ForkRecord</title>
+
+ <link rel="stylesheet"
href="../../../../../css/component-usage.css" type="text/css" />
+ </head>
+
+ <body>
+ <p>
+ ForkRecord allows the user to fork a record into
multiple records. To do that, the user must specify
+ a <a
href="../../../../../html/record-path-guide.html">RecordPath</a> pointing to a
field of type
+ ARRAY containing RECORD elements. The generated flow
file will contain the records from the specified
+ array. It is also possible to add in each record all
the fields of the parent records from the root
+ level to the record element being forked. However it
supposes the fields to add are defined in the
+ schema of the Record Writer controller service.
+ </p>
+
+ <h2>Examples</h2>
+
+ <p>
+ To better understand how this Processor works, we will
lay out a few examples. For the sake of these examples, let's assume that our
input
+ data is JSON formatted and looks like this:
+ </p>
+
+<code>
+<pre>
+[{
+ "id": 1,
+ "name": "John Doe",
+ "address": "123 My Street",
+ "city": "My City",
+ "state": "MS",
+ "zipCode": "11111",
+ "country": "USA",
+ "accounts": [{
+ "id": 42,
+ "balance": 4750.89
+ }, {
+ "id": 43,
+ "balance": 48212.38
+ }]
+},
+{
+ "id": 2,
+ "name": "Jane Doe",
+ "address": "345 My Street",
+ "city": "Her City",
+ "state": "NY",
+ "zipCode": "22222",
+ "country": "USA",
+ "accounts": [{
+ "id": 45,
+ "balance": 6578.45
+ }, {
+ "id": 46,
+ "balance": 34567.21
+ }]
+}]
+</pre>
+</code>
+
+
+ <h3>Example 1 - Fork without parent fields</h3>
+
+ <p>
+ For this case, we want to create one record per
<code>account</code> and we don't care about
+ the other fields. We'll set the Record path property to
<code>/accounts</code>. The resulting
+ flow file will contain 4 records and will look like
(assuming the Record Writer schema is
+ correctly set):
+ </p>
+
+<code>
+<pre>
+[{
+ "id": 42,
+ "balance": 4750.89
+}, {
+ "id": 43,
+ "balance": 48212.38
+}, {
+ "id": 45,
+ "balance": 6578.45
+}, {
+ "id": 46,
+ "balance": 34567.21
+}]
+</pre>
+</code>
+
+
+ <h3>Example 2 - Fork with parent fields</h3>
+
+ <p>
+ Now, if we set the property "Include parent fields" to
true, this will recursively include
--- End diff --
In such a case, I would have actually expected the result to have an
'accounts' field that is a 1-element array.
If we wanted to promote that up to the top level, an UpdateRecord processor
could be used.
So in general, the way that i would expect the processor to work is to
create a copy of the Record that
contains the exact same structure as the original except each element in
the array denoted by the
RecordPath would have a single element in the result. For example, if the
input looked like:
```
{
"id": 1,
"members": [{
"id": 42,
"name": "John Doe",
"accounts": [
{
"id": 382,
"name": "first account",
"balance": 17.82
}, {
"id": 482,
"name": "other account",
"balance": 182.34
}
]
}, {
"id": 43,
"name": "Jane Doe",
"accounts": [
{
"id": 492,
"name": "yet another account",
"balance": 21.12
}, {
"id": 513,
"name": "final account",
"balance": 142.22
}
]
}]
}
```
Then, if we set the Record Path to `/members`, I would expect output of:
```
[{
"id": 1,
"members": [{
"id": 42,
"name": "John Doe",
"accounts": [
{
"id": 382,
"name": "first account",
"balance": 17.82
}, {
"id": 482,
"name": "other account",
"balance": 182.34
}
]
}]
}, {
"id": 1,
"members": [{
"id": 43,
"name": "Jane Doe",
"accounts": [
{
"id": 492,
"name": "yet another account",
"balance": 21.12
}, {
"id": 513,
"name": "final account",
"balance": 142.22
}
]
}]
}]
```
But if I set the Record Path to `/members[*]/accounts`, then I would expect
4 records to be output, each having a single element in the 'members' array
and each having a single element in the 'accounts' sub-array.
If we took this approach, I think it would make the processor a lot more
powerful
because an UpdateRecord processor could be used next if necessary to pair
down
the records to the desired fields. This gives us a lot of flexibility with
just 1
or 2 processors.
Thoughts?
> Create a ForkRecord processor
> -----------------------------
>
> Key: NIFI-4227
> URL: https://issues.apache.org/jira/browse/NIFI-4227
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Pierre Villard
> Assignee: Pierre Villard
> Attachments: TestForkRecord.xml
>
>
> I'd like a way to fork a record containing an array of records into multiple
> records, each one being an element of the array. In addition, if configured
> to, I'd like the option to add to each new record the parent fields.
> For example, if I've:
> {noformat}
> [{
> "id": 1,
> "name": "John Doe",
> "address": "123 My Street",
> "city": "My City",
> "state": "MS",
> "zipCode": "11111",
> "country": "USA",
> "accounts": [{
> "id": 42,
> "balance": 4750.89
> }, {
> "id": 43,
> "balance": 48212.38
> }]
> },
> {
> "id": 2,
> "name": "Jane Doe",
> "address": "345 My Street",
> "city": "Her City",
> "state": "NY",
> "zipCode": "22222",
> "country": "USA",
> "accounts": [{
> "id": 45,
> "balance": 6578.45
> }, {
> "id": 46,
> "balance": 34567.21
> }]
> }]
> {noformat}
> Then, I want to generate records looking like:
> {noformat}
> [{
> "id": 42,
> "balance": 4750.89
> }, {
> "id": 43,
> "balance": 48212.38
> }, {
> "id": 45,
> "balance": 6578.45
> }, {
> "id": 46,
> "balance": 34567.21
> }]
> {noformat}
> Or, if parent fields are included, looking like:
> {noformat}
> [{
> "name": "John Doe",
> "address": "123 My Street",
> "city": "My City",
> "state": "MS",
> "zipCode": "11111",
> "country": "USA",
> "id": 42,
> "balance": 4750.89
> }, {
> "name": "John Doe",
> "address": "123 My Street",
> "city": "My City",
> "state": "MS",
> "zipCode": "11111",
> "country": "USA",
> "id": 43,
> "balance": 48212.38
> }, {
> "name": "Jane Doe",
> "address": "345 My Street",
> "city": "Her City",
> "state": "NY",
> "zipCode": "22222",
> "country": "USA",
> "id": 45,
> "balance": 6578.45
> }, {
> "name": "Jane Doe",
> "address": "345 My Street",
> "city": "Her City",
> "state": "NY",
> "zipCode": "22222",
> "country": "USA",
> "id": 46,
> "balance": 34567.21
> }]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)