This is an automated email from the ASF dual-hosted git repository.
pvillard pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/nifi.git
The following commit(s) were added to refs/heads/main by this push:
new 77c6f0a NIFI-9319 Make edits and corrections to latest additions to
User Guide
77c6f0a is described below
commit 77c6f0a819d2b52d986042ab7dd5bed6ca500ae5
Author: Andrew Lim <[email protected]>
AuthorDate: Thu Oct 21 12:37:39 2021 -0400
NIFI-9319 Make edits and corrections to latest additions to User Guide
Signed-off-by: Pierre Villard <[email protected]>
This closes #5474.
---
.../asciidoc/images/configure-process-group.png | Bin 65302 -> 0 bytes
.../images/process-group-configuration-window.png | Bin 102300 -> 118585 bytes
nifi-docs/src/main/asciidoc/user-guide.adoc | 52 ++++++++++-----------
3 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/nifi-docs/src/main/asciidoc/images/configure-process-group.png
b/nifi-docs/src/main/asciidoc/images/configure-process-group.png
deleted file mode 100644
index 2b1076d..0000000
Binary files a/nifi-docs/src/main/asciidoc/images/configure-process-group.png
and /dev/null differ
diff --git
a/nifi-docs/src/main/asciidoc/images/process-group-configuration-window.png
b/nifi-docs/src/main/asciidoc/images/process-group-configuration-window.png
index 8921129..58b9dd6 100644
Binary files
a/nifi-docs/src/main/asciidoc/images/process-group-configuration-window.png and
b/nifi-docs/src/main/asciidoc/images/process-group-configuration-window.png
differ
diff --git a/nifi-docs/src/main/asciidoc/user-guide.adoc
b/nifi-docs/src/main/asciidoc/user-guide.adoc
index b8ff7ca..7583759 100644
--- a/nifi-docs/src/main/asciidoc/user-guide.adoc
+++ b/nifi-docs/src/main/asciidoc/user-guide.adoc
@@ -210,8 +210,8 @@ The available component-level access policies are:
|view the component |Allows users to view component configuration details
|modify the component |Allows users to modify component configuration details
|view provenance |Allows users to view provenance events generated by this
component
-|view the data |Allows users to view metadata and content for this
component in flowfile queues in outbound connections and through provenance
events
-|modify the data |Allows users to empty flowfile queues in outbound
connections and submit replays through provenance events
+|view the data |Allows users to view metadata and content for this
component in FlowFile queues in outbound connections and through provenance
events
+|modify the data |Allows users to empty FlowFile queues in outbound
connections and submit replays through provenance events
|view the policies |Allows users to view the list of users who can view and
modify a component
|modify the policies |Allows users to modify the list of users who can view
and modify a component
|retrieve data via site-to-site |Allows a port to receive data from NiFi
instances
@@ -301,7 +301,7 @@ While the options available from the context menu vary, the
following options ar
NOTE: For Processors, Ports, Remote Process Groups, Connections and Labels, it
is possible to open the configuration dialog by double-clicking on the desired
component.
- *Start* or *Stop*: This option allows the user to start or stop a Processor;
the option will be either Start or Stop, depending on the current state of the
Processor.
-- *Run Once*: This option allows the user to run a selected Processor exactly
once. If the Processor is prevented from executing (e.g. there are no incoming
FlowFiles or the outgoing connection has back pressure applied) the Processor
won't get triggered. *Execution* settings apply - i.e. *Primary Node* and *All
Nodes* setting will result in running the Processor only once on the Primary
Node or one time on each of the nodes, respectively. Works only with *Timer
Driven* and *CRON driven* [...]
+- *Run Once*: This option allows the user to run a selected Processor exactly
once. If the Processor is prevented from executing (e.g., there are no incoming
FlowFiles or the outgoing connection has back pressure applied) the Processor
won't get triggered. *Execution* settings apply (i.e., *Primary Node* and *All
Nodes* settings will result in running the Processor only once on the Primary
Node or one time on each of the nodes, respectively). Works only with *Timer
driven* and *CRON driv [...]
- *Enable* or *Disable*: This option allows the user to enable or disable a
Processor; the option will be either Enable or Disable, depending on the
current state of the Processor.
- *View data provenance*: This option displays the NiFi Data Provenance table,
with information about data provenance events for the FlowFiles routed through
that Processor (see <<data_provenance>>).
- *View status history*: This option opens a graphical representation of the
Processor's statistical information over time.
@@ -653,7 +653,7 @@ The 'Run Schedule' dictates how often the Processor should
be scheduled to run.
Scheduling Strategy (see above). If using the Event driven Scheduling
Strategy, this field is not available. When using the Timer driven
Scheduling Strategy, this value is a time duration specified by a number
followed by a time unit. For example, `1 second` or `5 mins`.
The default value of `0 sec` means that the Processor should run as often as
possible as long as it has data to process. This is true
-for any time duration of 0, regardless of the time unit (i.e., `0 sec`, `0
mins`, `0 days`). For an explanation of values that are
+for any time duration of 0, regardless of the time unit (e.g., `0 sec`, `0
mins`, `0 days`). For an explanation of values that are
applicable for the CRON driven Scheduling Strategy, see the description of the
CRON driven Scheduling Strategy itself.
===== Execution
@@ -731,7 +731,7 @@ You can access additional documentation about each
Processor's usage by right-cl
=== Configuring a Process Group
To configure a Process Group, right-click on the Process Group and select the
`Configure` option from the context menu. The configuration dialog is opened
with two tabs: General and Controller Services.
-image::configure-process-group.png["Configure Process Group"]
+image::process-group-configuration-window.png["Configure Process Group"]
[[General_tab_ProcessGroup]]
@@ -740,7 +740,7 @@ This tab contains several different configuration items.
First is the Process Gr
The next configuration element is the Process Group Parameter Context, which
is used to provide parameters to components of the flow. From this drop-down,
the user is able to choose which Parameter Context should be bound to this
Process Group and can optionally create a new one to bind to the Process Group.
For more information refer to <<Parameters>> and <<parameter-contexts,Parameter
Contexts>>.
-The third element in the configuration dialog is the Process Group Comments.
This provides a mechanism for providing any useful information or context about
the Process Group.
+The third element in the configuration dialog is the Process Group Comments.
This provides a mechanism to add any useful information about the Process Group.
The next two elements, Process Group FlowFile Concurrency and Process Group
Outbound Policy, are covered in the following sections.
@@ -784,14 +784,14 @@ data that arrives at an Output Port is immediately
transferred out of the Proces
When the Outbound Policy is configured to "Batch Output", the Output Ports
will not transfer data out of the Process Group until
all data that is in the Process Group is queued up at an Output Port (i.e., no
data leaves the Process Group until all of the data has finished processing).
It doesn't matter whether the data is all queued up for the same Output Port,
or if some data is queued up for Output Port A while other data is queued up
-for Output Port B. These conditions are both considered the same in terms of
the completion of the FlowFile Processing.
+for Output Port B. These conditions are both considered the same in terms of
the completion of the FlowFile processing.
Using an Outbound Policy of "Batch Output" along with a FlowFile Concurrency
of "Single FlowFile Per Node" allows a user to easily ingest a single FlowFile
(which in and of itself may represent a batch of data) and then wait until all
processing of that FlowFile has completed before continuing on to the next step
in the dataflow (i.e., the next component outside of the Process Group).
Additionally, when using this mode, each FlowFile that is transferred out of
the Process Group
will be given a series of attributes named "batch.output.<Port Name>" for each
Output Port in the Process Group. The value will be equal to the number of
FlowFiles
-that were routed to that Output Port for this batch of data. For example,
consider a case where a single FlowFile is split into 5 FlowFiles, and two
FlowFiles go to Output Port A, one goes
-to Output Port B, and two go to Output Port C, and no FlowFiles go to Output
Port D. In this case, each FlowFile will have attributes `batch.output.A = 2`,
+that were routed to that Output Port for this batch of data. For example,
consider a case where a single FlowFile is split into 5 FlowFiles: two
FlowFiles go to Output Port A, one goes
+to Output Port B, two go to Output Port C, and no FlowFiles go to Output Port
D. In this case, each FlowFile will have attributes `batch.output.A = 2`,
`batch.output.B = 1`, `batch.output.C = 2`, `batch.output.D = 0`.
The Outbound Policy of "Batch Output" doesn't provide any benefits when used
in conjunction with a FlowFile Concurrency of "Unbounded".
@@ -801,7 +801,7 @@ As a result, the Outbound Policy is ignored if the FlowFile
Concurrency is set t
[[Connecting_Batch_Oriented_Groups]]
===== Connecting Batch-Oriented Process Groups
-A common use case in NiFi is to perform some batch-oriented process and only
after that process completes perform another process on that same batch of data.
+A common use case in NiFi is to perform some batch-oriented process and only
after that process completes, perform another process on that same batch of
data.
NiFi makes this possible by encapsulating each of these processes in its own
Process Group. The Outbound Policy of the first Process Group should be
configured as "Batch Output"
while the FlowFile Concurrency should be either "Single FlowFile Per Node" or
"Single Batch Per Node". With this configuration, the first Process Group
@@ -809,7 +809,7 @@ will process an entire batch of data (which will either be
a single FlowFile or
When processing has completed for that batch of data, the data will be held
until all FlowFiles are finished processing and ready to leave the Process
Group. At that point, the data can be transferred out of the Process Group as a
batch. This configuration - when a Process Group is configured with an Outbound
Policy of "Batch Output"
and an Output Port is connected directly to the Input Port of a Process Group
with a FlowFile Concurrency of "Single Batch Per Node" - is treated as a
slightly special case.
The receiving Process Group will ingest data not only until its input queues
are empty but until they are empty AND the source Process Group has transferred
all of the data from that
-batch out of the Process Group. This allows a collection of FlowFiles to be
transferred as a single batch of data between Process Groups - even if those
FlowFiles
+batch out of the Process Group. This allows a collection of FlowFiles to be
transferred as a single batch of data between Process Groups, even if those
FlowFiles
are spread across multiple ports.
@@ -837,10 +837,10 @@ See <<Backpressure>> for more information.
===== Default Settings for Connections
The final three elements in the Process Group configuration dialog are for
Default FlowFile Expiration, Default Back Pressure Object Threshold, and
Default Back Pressure Data Size Threshold. These settings configure the
default values when creating a new Connection. Each Connection represents a
queue,
-and every queue has settings for flowfile expiration, back pressure object
count, and back pressure data size. The settings specified here will effect the
-default values for all new Connections created within the Process Group; it
will not effect existing Connections. Child Process Groups created within the
-configured Process Group will inherit the default settings. Again, existing
Process Groups will not be effected. If not overridden with these options, the
-root Process Group obtains its default back pressure settings from
nifi.properties, and has a default FlowFile expiration of "0 sec", i.e. do not
expire.
+and every queue has settings for FlowFile expiration, back pressure object
count, and back pressure data size. The settings specified here will affect the
+default values for all new Connections created within the Process Group; it
will not affect existing Connections. Child Process Groups created within the
+configured Process Group will inherit the default settings. Again, existing
Process Groups will not be affected. If not overridden with these options, the
+root Process Group obtains its default back pressure settings from
`nifi.properties`, and has a default FlowFile expiration of "0 sec" (i.e., do
not expire).
NOTE: Setting the Default FlowFile Expiration to a non-zero value may lead to
data loss due to a FlowFile expiring as its time limit is reached.
@@ -918,7 +918,7 @@ The Referencing Components section now lists an aggregation
of all the component
==== Parameters and Expression Language
When adding a Parameter that makes use of the Expression Language, it is
important to understand the context in which the Expression Language will be
evaluated. The expression is always evaluated
-in the context of the Process or Controller Service that references the
Parameter. Take, for example, a scenario where Parameter with the name `Time`
is added with a value of `${now()}`. The
+in the context of the Processor or Controller Service that references the
Parameter. Take, for example, a scenario where a Parameter with the name `Time`
is added with a value of `${now()}`. The
Expression Language results in a call to determine the system time when it is
evaluated. When added as a Parameter, the system time is not evaluated when the
Parameter is added, but rather when a
Processor or Controller Service evaluates the Expression. That is, if a
Processor has a Property whose value is set to `#{Time}` it will function in
exactly the same manner as if the Property's
value were set to `${now()}`. Each time that the property is referenced, it
will produce a different timestamp.
@@ -1138,7 +1138,7 @@ image::variable-putfile-property.png["Processor Property
Using Variable"]
===== Variable Scope
-Variables are scoped by the Process Group they are defined in and are
available to any Processor defined at that level and below (i.e. any descendant
Processors).
+Variables are scoped by the Process Group they are defined in and are
available to any Processor defined at that level and below (i.e., any
descendant Processors).
Variables in a descendant group override the value in a parent group. More
specifically, if a variable `x` is declared at the root group and also declared
inside a process group, components inside the process group will use the value
of `x` defined in the process group.
@@ -1456,7 +1456,7 @@ The following prioritizers are available:
** Note that an UpdateAttribute processor should be used to add the "priority"
attribute to the FlowFiles before they reach a connection that has this
prioritizer set.
** If only one has that attribute it will go first.
** Values for the "priority" attribute can be alphanumeric, where "a" will
come before "z" and "1" before "9"
-** If "priority" attribute cannot be parsed as a long, unicode string ordering
will be used. For example: "99" and "100" will be ordered so the flowfile with
"99" comes first, but "A-99" and "A-100" will sort so the flowfile with "A-100"
comes first.
+** If "priority" attribute cannot be parsed as a long, unicode string ordering
will be used. For example: "99" and "100" will be ordered so the FlowFile with
"99" comes first, but "A-99" and "A-100" will sort so the FlowFile with "A-100"
comes first.
NOTE: With a <<load_balance_strategy>> configured, the connection has a queue
per node in addition to the local queue. The prioritizer will sort the data in
each queue independently.
@@ -1694,17 +1694,17 @@ be performed. The number of active tasks is shown in
the top-right corner of the
for more information). See <<terminating_tasks>> for how to terminate the
running tasks.
[[terminating_tasks]]
-=== Terminating a Component's tasks
+=== Terminating a Component's Tasks
When a component is stopped, it does not interrupt the currently running
tasks. This allows for the current execution to complete while no new
-tasks are scheduled, which is the desired behaviour in many cases. In some
cases, it is desirable to terminate the running tasks, particularly
+tasks are scheduled, which is the desired behavior in many cases. In some
cases, it is desirable to terminate the running tasks, particularly
in cases where a task has hung and is no longer responsive, or while
developing new flows.
To be able to terminate the running task(s), the component must first be
stopped (see <<stopping_components>>). Once the component is in the
-Stopped state, the Terminate option will become available only if there are
tasks still running (See <<processor_anatomy>>). The Terminate option
-(image:iconTerminate.png["Terminate"]) can be accessed either via the context
menu or the Operations Palette while the component is selected.
+Stopped state, the Terminate option will become available only if there are
tasks still running (see <<processor_anatomy>>). The Terminate option
+(image:iconTerminate.png["Terminate"]) can be accessed via the context menu or
the Operate Palette while the component is selected.
-The number of tasks that are actively being terminated will be displayed in
parentheses next to the number of active tasks e.g.
image:terminated-thread.png["Terminated-Threads"]. For example, if there is one
active task at the time that Terminate is selected, this will display "0 (1)" -
meaning
+The number of tasks that are actively being terminated will be displayed in
parentheses next to the number of active tasks
(image:terminated-thread.png["Terminated-Threads"]). For example, if there is
one active task at the time that Terminate is selected, this will display "0
(1)" - meaning
0 active tasks and 1 task being terminated.
A task may not terminate immediately, as different components may respond to
the Terminate command differently. However, the components can be
@@ -2160,7 +2160,7 @@ The FlowFiles enqueued in a Connection can be viewed when
necessary. The Queue l
a Connection's context menu. The listing will return the top 100 FlowFiles in
the active queue according to the
configured priority. The listing can be performed even if the source and
destination are actively running.
-Additionally, details for a Flowfile in the listing can be viewed by clicking
the "Details" button (image:iconDetails.png["Details"]) in the left most
column. From here, the FlowFile details and attributes are available as well as
buttons for
+Additionally, details for a FlowFile in the listing can be viewed by clicking
the "Details" button (image:iconDetails.png["Details"]) in the left most
column. From here, the FlowFile details and attributes are available as well as
buttons for
downloading or viewing the content. Viewing the content is only available if
the `nifi.content.viewer.url` has been configured.
If the source or destination of the Connection are actively running, there is
a chance that the desired FlowFile will
no longer be in the active queue.
@@ -2761,7 +2761,7 @@ The provenance event types are:
|FORK |Indicates that one or more FlowFiles were derived
from a parent FlowFile
|JOIN |Indicates that a single FlowFile is derived from
joining together multiple parent FlowFiles
|RECEIVE |Indicates a provenance event for receiving data from
an external process
-|REMOTE_INVOCATION |Indicates that a remote invocation was requested to
an external endpoint (e.g. deleting a remote resource)
+|REMOTE_INVOCATION |Indicates that a remote invocation was requested to
an external endpoint (e.g., deleting a remote resource)
|REPLAY |Indicates a provenance event for replaying a FlowFile
|ROUTE |Indicates that a FlowFile was routed to a specified
relationship and provides information about why the FlowFile was routed to this
relationship
|SEND |Indicates a provenance event for sending data to an
external process
@@ -2868,7 +2868,7 @@ java.arg.13=-XX:+UseG1GC
Many of the same system properties are supported by both the Persistent and
Write Ahead configurations, however the default values have been chosen for a
Persistent Provenance configuration. The following exceptions and
recommendations should be noted when changing to a Write Ahead configuration:
* `nifi.provenance.repository.journal.count` is not relevant to a Write Ahead
configuration
-* `nifi.provenance.repository.concurrent.merge.threads` and
`nifi.provenance.repository.warm.cache.frequency` are new properties. The
default values of `2` for threads and blank for frequency (i.e. disabled)
should remain for most installations.
+* `nifi.provenance.repository.concurrent.merge.threads` and
`nifi.provenance.repository.warm.cache.frequency` are new properties. The
default values of `2` for threads and blank for frequency (i.e., disabled)
should remain for most installations.
* Change the settings for `nifi.provenance.repository.max.storage.time`
(default value of `24 hours`) and `nifi.provenance.repository.max.storage.size`
(default value of `1 GB`) to values more suitable for your production
environment
* Change `nifi.provenance.repository.index.shard.size` from the default value
of `500 MB` to `4 GB`
* Change `nifi.provenance.repository.index.threads` from the default value of
`2` to either `4` or `8` as the Write Ahead repository enables this to scale
better