Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2037#discussion_r133768739
--- Diff:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/resources/docs/org.apache.nifi.processors.standard.ForkRecord/additionalDetails.html
---
@@ -0,0 +1,365 @@
+<!DOCTYPE html>
+<html lang="en">
+ <!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version
2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ -->
+ <head>
+ <meta charset="utf-8" />
+ <title>ForkRecord</title>
+
+ <link rel="stylesheet"
href="../../../../../css/component-usage.css" type="text/css" />
+ </head>
+
+ <body>
+ <p>
+ ForkRecord allows the user to fork a record into
multiple records. To do that, the user must specify
+ a <a
href="../../../../../html/record-path-guide.html">RecordPath</a> pointing to a
field of type
+ ARRAY containing RECORD elements. The generated flow
file will contain the records from the specified
+ array. It is also possible to add in each record all
the fields of the parent records from the root
+ level to the record element being forked. However it
supposes the fields to add are defined in the
+ schema of the Record Writer controller service.
+ </p>
+
+ <h2>Examples</h2>
+
+ <p>
+ To better understand how this Processor works, we will
lay out a few examples. For the sake of these examples, let's assume that our
input
+ data is JSON formatted and looks like this:
+ </p>
+
+<code>
+<pre>
+[{
+ "id": 1,
+ "name": "John Doe",
+ "address": "123 My Street",
+ "city": "My City",
+ "state": "MS",
+ "zipCode": "11111",
+ "country": "USA",
+ "accounts": [{
+ "id": 42,
+ "balance": 4750.89
+ }, {
+ "id": 43,
+ "balance": 48212.38
+ }]
+},
+{
+ "id": 2,
+ "name": "Jane Doe",
+ "address": "345 My Street",
+ "city": "Her City",
+ "state": "NY",
+ "zipCode": "22222",
+ "country": "USA",
+ "accounts": [{
+ "id": 45,
+ "balance": 6578.45
+ }, {
+ "id": 46,
+ "balance": 34567.21
+ }]
+}]
+</pre>
+</code>
+
+
+ <h3>Example 1 - Fork without parent fields</h3>
+
+ <p>
+ For this case, we want to create one record per
<code>account</code> and we don't care about
+ the other fields. We'll set the Record path property to
<code>/accounts</code>. The resulting
+ flow file will contain 4 records and will look like
(assuming the Record Writer schema is
+ correctly set):
+ </p>
+
+<code>
+<pre>
+[{
+ "id": 42,
+ "balance": 4750.89
+}, {
+ "id": 43,
+ "balance": 48212.38
+}, {
+ "id": 45,
+ "balance": 6578.45
+}, {
+ "id": 46,
+ "balance": 34567.21
+}]
+</pre>
+</code>
+
+
+ <h3>Example 2 - Fork with parent fields</h3>
+
+ <p>
+ Now, if we set the property "Include parent fields" to
true, this will recursively include
--- End diff --
In such a case, I would have actually expected the result to have an
'accounts' field that is a 1-element array.
If we wanted to promote that up to the top level, an UpdateRecord processor
could be used.
So in general, the way that i would expect the processor to work is to
create a copy of the Record that
contains the exact same structure as the original except each element in
the array denoted by the
RecordPath would have a single element in the result. For example, if the
input looked like:
```
{
"id": 1,
"members": [{
"id": 42,
"name": "John Doe",
"accounts": [
{
"id": 382,
"name": "first account",
"balance": 17.82
}, {
"id": 482,
"name": "other account",
"balance": 182.34
}
]
}, {
"id": 43,
"name": "Jane Doe",
"accounts": [
{
"id": 492,
"name": "yet another account",
"balance": 21.12
}, {
"id": 513,
"name": "final account",
"balance": 142.22
}
]
}]
}
```
Then, if we set the Record Path to `/members`, I would expect output of:
```
[{
"id": 1,
"members": [{
"id": 42,
"name": "John Doe",
"accounts": [
{
"id": 382,
"name": "first account",
"balance": 17.82
}, {
"id": 482,
"name": "other account",
"balance": 182.34
}
]
}]
}, {
"id": 1,
"members": [{
"id": 43,
"name": "Jane Doe",
"accounts": [
{
"id": 492,
"name": "yet another account",
"balance": 21.12
}, {
"id": 513,
"name": "final account",
"balance": 142.22
}
]
}]
}]
```
But if I set the Record Path to `/members[*]/accounts`, then I would expect
4 records to be output, each having a single element in the 'members' array
and each having a single element in the 'accounts' sub-array.
If we took this approach, I think it would make the processor a lot more
powerful
because an UpdateRecord processor could be used next if necessary to pair
down
the records to the desired fields. This gives us a lot of flexibility with
just 1
or 2 processors.
Thoughts?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---