Repository: nifi-minifi-cpp
Updated Branches:
  refs/heads/master 002b07434 -> c719b40a7


MINIFICPP-328 Added processor documentation

This closes #214.

Signed-off-by: Marc Parisi <phroc...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/nifi-minifi-cpp/repo
Commit: http://git-wip-us.apache.org/repos/asf/nifi-minifi-cpp/commit/c719b40a
Tree: http://git-wip-us.apache.org/repos/asf/nifi-minifi-cpp/tree/c719b40a
Diff: http://git-wip-us.apache.org/repos/asf/nifi-minifi-cpp/diff/c719b40a

Branch: refs/heads/master
Commit: c719b40a7f9dd23f54e0222876680aedb988b786
Parents: 002b074
Author: Andy I. Christianson <a...@andyic.org>
Authored: Thu Nov 30 13:57:24 2017 -0500
Committer: Marc Parisi <phroc...@apache.org>
Committed: Wed Dec 13 12:55:29 2017 -0500

----------------------------------------------------------------------
 PROCESSORS.md | 725 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 README.md     |  44 ++--
 2 files changed, 749 insertions(+), 20 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/nifi-minifi-cpp/blob/c719b40a/PROCESSORS.md
----------------------------------------------------------------------
diff --git a/PROCESSORS.md b/PROCESSORS.md
new file mode 100644
index 0000000..09c7d24
--- /dev/null
+++ b/PROCESSORS.md
@@ -0,0 +1,725 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+      http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+# Processors
+
+## Table of Contents
+
+- [AppendHostInfo](#appendhostinfo)
+- [ExecuteProcess](#executeprocess)
+- [ExecuteScript](#executescript)
+- [GetFile](#getfile)
+- [GetUSBCamera](#getusbcamera)
+- [GenerateFlowFile](#generateflowfile)
+- [InvokeHTTP](#invokehttp)
+- [LogAttribute](#logattribute)
+- [ListenHTTP](#listenhttp)
+- [ListenSyslog](#listensyslog)
+- [PutFile](#putfile)
+- [TailFile](#tailfile)
+- [MergeContent](#mergecontent)
+- [ExtractText](#extracttext)
+- [CompressContent](#compresscontent)
+- [FocusArchiveEntry](#focusarchiveentry)
+- [UnfocusArchiveEntry](#unfocusarchiveentry)
+- [ManipulateArchive](#manipulatearchive)
+- [PublishKafka](#publishkafka)
+
+## AppendHostInfo
+
+### Description
+
+Appends host information such as IP address and hostname  as an attribute to
+incoming flowfiles.
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Network Interface Name** | eth0 | | Network interface from which to read 
an IP v4 address |
+| **Hostname Attribute** | source.hostname |  | Flowfile attribute to used to 
record the agent's hostname |
+| **IP Attribute** | source.ipv4 |  | Flowfile attribute to used to record the 
agent's IP address |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | All FlowFiles are routed to this relationship. |
+
+## ExecuteProcess
+
+### Description
+
+Runs an operating system command specified by the user and writes the output of
+that command to a FlowFile. If the command is expected to be long-running, the
+Processor can output the partial data on a specified interval. When this option
+is used, the output is expected to be in textual format, as it typically does
+not make sense to split binary data on arbitrary time-based intervals.
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Command** | | | Specifies the command to be executed; if just the name of 
an executable is provided, it must be in the user's environment PATH. |
+| Command Arguments | | | The arguments to supply to the executable delimited 
by white space. White space can be escaped by enclosing it in double-quotes. |
+| Working Directory | | | The directory to use as the current working 
directory when executing the command |
+| **Batch Duration** | 0  || If the process is expected to be long-running and 
produce textual output, a batch duration can be specified. |
+| **Redirect Error Stream** | false | | If true will redirect any error stream 
output of the process to the output stream. |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | All created FlowFiles are routed to this relationship. |
+
+## ExecuteScript
+
+### Description
+
+Executes a script given the flow file and a process session. The script is
+responsible for handling the incoming flow file (transfer to SUCCESS or remove,
+e.g.) as well as any flow files created by the script. If the handling is
+incomplete or incorrect, the session will be rolled back.
+
+Scripts must define an onTrigger function which accepts NiFi Context and
+Property objects. For efficiency, scripts are executed once when the processor
+is run, then the onTrigger method is called for each incoming flowfile. This
+enables scripts to keep state if they wish, although there will be a script
+context per concurrent task of the processor. In order to, e.g., compute an
+arithmetic sum based on incoming flow file information, set the concurrent
+tasks to 1.
+
+Multiple variables are provided automatically to script contexts:
+
+| Name | Purpose |
+| - | - |
+| `log` | the logging object used for logging debug, info, warn, and error |
+| `REL_SUCCESS` | the "success" relationship |
+| `REL_FAILURE` | the "failure" relationship |
+
+### Python Notes
+
+- ExecuteScript uses native Python bindings, which means that any environment 
packages (e.g. those installed by pip) can be referenced.
+- Virtual environments do work.
+- It is possible to build MiNiFi - C++ against implementations which are 
compatible with CPython. CPython v2.7 as well as v3.4 do work, but only one 
version can be used at a time for a build.
+- We recommend building multiple MiNiFi - C++ executables if there is a 
requirement for operating multiple Python versions or implmentations 
concurrently.
+- Be mindful of the Global Interpreter Lock (GIL); While Python scripts will 
execute concurrently, execution is multiplexed from a single thread. Generally 
speaking, Python input/output and library calls with native implementations 
will release the GIL and enable concurrency, but long-running Python code such 
as loops will block other threads.
+
+### Lua Notes
+
+- Take care to use the `local` keyword where appropriate (such as inside of 
onTrigger functions and read/write callbacks) as scripts are long-running and 
this will help to avoid memory leaks.
+- The following Lua packages are enabled/available for scripts:
+  - `base`
+  - `os`
+  - `coroutine`
+  - `math`
+  - `io`
+  - `string`
+  - `table`
+  - `utf8`
+  - `package`
+
+#### Python Example: Reading a File
+
+```python
+import codecs
+
+class ReadCallback(object):
+  def process(self, input_stream):
+    content = codecs.getreader('utf-8')(input_stream).read()
+    log.info('file content: %s' % content)
+    return len(content)
+
+def onTrigger(context, session):
+  flow_file = session.get()
+
+  if flow_file is not None:
+    log.info('got flow file: %s' % flow_file.getAttribute('filename'))
+    session.read(flow_file, ReadCallback())
+    session.transfer(flow_file, REL_SUCCESS)
+```
+
+#### Python Example: Writing a File
+
+```python
+class WriteCallback(object):
+  def process(self, output_stream):
+    new_content = 'hello 2'.encode('utf-8')
+    output_stream.write(new_content)
+    return len(new_content)
+
+def onTrigger(context, session):
+  flow_file = session.get()
+  if flow_file is not None:
+    log.info('got flow file: %s' % flow_file.getAttribute('filename'))
+    session.write(flow_file, WriteCallback())
+    session.transfer(flow_file, REL_SUCCESS)
+```
+
+#### Lua Example: Reading a File
+
+```lua
+read_callback = {}
+
+function read_callback.process(self, input_stream)
+    local content = input_stream:read()
+    log:info('file content: ' .. content)
+    return #content
+end
+
+function onTrigger(context, session)
+  local flow_file = session:get()
+
+  if flow_file ~= nil then
+    log:info('got flow file: ' .. flow_file:getAttribute('filename'))
+    session:read(flow_file, read_callback)
+    session:transfer(flow_file, REL_SUCCESS)
+  end
+end
+```
+
+#### Lua Example: Writing a File
+
+```lua
+write_callback = {}
+
+function write_callback.process(self, output_stream)
+  local new_content = 'hello 2'
+  output_stream:write(new_content)
+  return #new_content
+end
+
+function onTrigger(context, session)
+  local flow_file = session:get()
+
+  if flow_file ~= nil then
+    log:info('got flow file: ' .. flow_file:getAttribute('filename'))
+    session:write(flow_file, write_callback)
+    session:transfer(flow_file, REL_SUCCESS)
+  end
+end
+```
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Script Engine** | python | python, lua | The engine to execute scripts |
+| Script File | | | Path to script file to execute.  Only one of Script File 
or Script Body may be used | 
+| Script Body | | | Body of script to execute.  Only one of Script File or 
Script Body may be used | 
+| Module Directory | | | Comma-separated list of paths to files and/or 
directories which contain modules required by the script | 
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | Script successes |
+| failure | Script failures |
+
+## GetFile
+
+### Description
+
+Creates FlowFiles from files in a directory. NiFi will ignore files it doesn't
+have at least read permissions for.
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Batch Size** | 10 | | The maximum number of files to pull in each 
iteration |
+| **Input Directory** | . | | The input directory from which to pull files |
+| **Ignore Hidden Files** | true | | Indicates whether or not hidden files 
should be ignored |
+| **Keep Source File** | false | | If true, the file is not deleted after it 
has been copied to the Content Repository |
+| Maximum File Age | 0 sec | | The minimum age that a file must be in order to 
be pulled any file younger than this amount of time (according to last 
modification date) will be ignored |
+| **Minimum File Age** | 0 sec | | The maximum age that a file must be in 
order to be pulled; any file older than this amount of time (according to last 
modification date) will be ignored |
+| Maximum File Size | 0 B | | The maximum size that a file can be in order to 
be pulled |
+| **Minimum File Size** | 0 B | | The minimum size that a file must be in 
order to be pulled |
+| **Polling Interval** | 0 sec | | Indicates how long to wait before 
performing a directory listing |
+| **Recurse Subdirectories** | true | | Indicates whether or not to pull files 
from subdirectories |
+| **File Filter** | [^\\\\.].\* | | Only files whose names match the given 
regular expression will be picked up |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | All files are routed to success |
+
+## GetUSBCamera
+
+### Description
+
+Gets images from USB Video Class (UVC)-compatible devices. Outputs one frame
+per flow file at the rate specified by the `FPS` property in the format
+specified by the `Format` property.
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **FPS** | 1 | | Frames per second to capture from USB camera |
+| **Format** | PNG | PNG, RAW | Frame format (currently only PNG and RAW are 
supported; RAW is a binary pixel buffer of RGB values) |
+| USB Vendor ID | | | USB Vendor ID of camera device, in hexadecimal format |
+| USB Product ID | | | USB Product ID of camera device, in hexadecimal format|
+| USB Serial No. | | | USB Serial No. of camera device |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | Sucessfully captured images sent here |
+| failure | Failures sent here |
+
+## GenerateFlowFile
+
+### Description
+
+This processor creates FlowFiles with random data or custom content.
+GenerateFlowFile is useful for load testing, configuration, and simulation.
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **File Size** | 1 kB | | The size of the file that will be used |
+| **Batch Size** | 1 | | The number of FlowFiles to be transferred in each 
invocation |
+| **Data Format** | Binary | Binary, Text | Specifies whether the data should 
be Text or Binary|
+| **Unique FlowFiles** | true | true, false | If true, each FlowFile that is 
generated will be unique. If false, a random value will be generated and all 
FlowFiles will get the same content but this offers much higher throughput |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | Generated FlowFiles are sent here |
+
+## InvokeHTTP
+
+### Description
+
+An HTTP client processor which can interact with a configurable HTTP Endpoint.
+The destination URL and HTTP Method are configurable. FlowFile attributes are
+converted to HTTP headers and the FlowFile contents are included as the body of
+the request (if the HTTP Method is PUT, POST or PATCH).
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **HTTP Method** | GET | | HTTP request method (GET, POST, PUT, PATCH, 
DELETE, HEAD, OPTIONS). Arbitrary methods are also supported. Methods other 
than POST, PUT and PATCH will be sent without a message body. |
+| **Remote URL** | | | Remote URL which will be connected to, including 
scheme, host, port, path.<br>**Supports Expression Language: true** |
+| **Connection Timeout** | 5 secs | | Max wait time for connection to remote 
service. |
+| **Read Timeout** | 15 secs | | Max wait time for response from remote 
service. |
+| **Include Date Header** | True | True, False | Include an RFC-2616 Date 
header in the request. |
+| **Follow Redirects** | True | | Follow HTTP redirects issued by remote 
server. |
+| Attributes to Send | | |  Regular expression that defines which attributes 
to send as HTTP headers in the request. If not defined, no attributes are sent 
as headers. |
+| SSL Context Service | | | The SSL Context Service used to provide client 
certificate information for TLS/SSL (https) connections. |
+| Proxy Host | | | The fully qualified hostname or IP address of the proxy 
server |
+| Proxy Port | | | The port of the proxy server |
+| invokehttp-proxy-user | | | Username to set when authenticating against 
proxy |
+| invokehttp-proxy-password | | |   Password to set when authenticating 
against proxy |
+| **Content-type** | application/octet-stream | | The Content-Type to specify 
for when content is being transmitted through a PUT, POST or PATCH. In the case 
of an empty value after evaluating an expression language expression, 
Content-Type defaults to application/octet-stream|
+| send-message-body | true | true, false | If true, sends the HTTP message 
body on POST/PUT/PATCH requests (default). If false, suppresses the message 
body and content-type header for these requests. |
+| **Use Chunked Encoding** | false | true, false | When POST'ing, PUT'ing or 
PATCH'ing content set this property to true in order to not pass the 
'Content-length' header and instead send 'Transfer-Encoding' with a value of 
'chunked'. This will enable the data transfer mechanism which was introduced in 
HTTP 1.1 to pass data of unknown lengths in chunks. |
+| Put Response Body in Attribute | | | If set, the response body received back 
will be put into an attribute of the original FlowFile instead of a separate 
FlowFile. The attribute key to put to is determined by evaluating value of this 
property. |
+| Always Output Response | false | true, false | Will force a response 
FlowFile to be generated and routed to the 'Response' relationship regardless 
of what the server status code received is or if the processor is configured to 
put the server response body in the request attribute. In the later 
configuration a request FlowFile with the response body in the attribute and a 
typical response FlowFile will be emitted to their respective relationships. |
+| Penalize on "No Retry" | false | true, false | Enabling this property will 
penalize FlowFiles that are routed to the "No Retry" relationship. |
+| Disable Peer Verification | false | true, false | Disables peer verification 
for the SSL session |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | All files are routed to success |
+| response | A Response FlowFile will be routed upon success (2xx status 
codes). If the 'Output Response Regardless' property is true then the response 
will be sent to this relationship regardless of the status code received. |
+| retry | The original FlowFile will be routed on any status code that can be 
retried (5xx status codes). It will have new attributes detailing the request. |
+| no retry | The original FlowFile will be routed on any status code that 
should NOT be retried (1xx, 3xx, 4xx status codes). It will have new attributes 
detailing the request. |
+| failure | The original FlowFile will be routed on any type of connection 
failure, timeout or general exception. It will have new attributes detailing 
the request. |
+
+## LogAttribute
+
+### Description
+
+Logs attributes of flow files in the MiNiFi application log.
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Log Level** | info | trace, debug, info, warn, error | The Log Level to 
use when logging the Attributes |
+| Attributes to Log |  | |  A comma-separated list of Attributes to Log. If 
not specified, all attributes will be logged. |
+| Attributes to Ignore |  | | A comma-separated list of Attributes to ignore. 
If not specified, no attributes will be ignored. |
+| **Log Payload** | false | true, false | If true, the FlowFile's payload will 
be logged, in addition to its attributes; otherwise, just the Attributes will 
be logged. |
+| Log prefix | | | Log prefix appended to the log lines. It helps to 
distinguish the output of multiple LogAttribute processors. |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | All FlowFiles are routed to this relationship |
+
+## ListenHTTP
+
+### Description
+
+Starts an HTTP Server and listens on a given base path to transform incoming
+requests into FlowFiles. The default URI of the Service will be
+`http://{hostname}:{port}/contentListener`. Only HEAD and POST requests are
+supported. GET, PUT, and DELETE will result in an error and the HTTP response
+status code 405.
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Base Path** | contentListener | | Base path for incoming connections |
+| **Listening Port** | | | The Port to listen on for incoming connections |
+| SSL Certificate | | | File containing PEM-formatted file including TLS/SSL 
certificate and key |
+| SSL Certificate Authority | | | File containing trusted PEM-formatted 
certificates |
+| SSL Verify Peer | no | yes, no | Whether or not to verify the client's 
certificate  |
+| SSL Minimum Version | SSL2 | SSL2, SSL3, TLS1.0, TLS1.1, TLS1.2 | Minimum 
TLS/SSL version allowed |
+| HTTP Headers to receive as Attributes (Regex) | | | Specifies the Regular 
Expression that determines the names of HTTP Headers that should be passed 
along as FlowFile attributes |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | Relationship for successfully received FlowFiles |
+
+## ListenSyslog
+
+### Description
+
+Listens for Syslog messages being sent to a given port over TCP or UDP.
+Incoming messages are checked against regular expressions for RFC5424 and
+RFC3164 formatted messages. The format of each message is:
+(\<PRIORITY\>)(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is
+optional. The timestamp can be an RFC5424 timestamp with a format of
+"yyyy-MM-dd'T'HH:mm:ss.SZ" or "yyyy-MM-dd'T'HH:mm:ss.S+hh:mm", or it can be an
+RFC3164 timestamp with a format of "MMM d HH:mm:ss". If an incoming messages
+matches one of these patterns, the message will be parsed and the individual
+pieces will be placed in FlowFile attributes, with the original message in the
+content of the FlowFile. If an incoming message does not match one of these
+patterns it will not be parsed and the syslog.valid attribute will be set to
+false with the original message in the content of the FlowFile. Valid messages
+will be transferred on the success relationship, and invalid messages will be
+transferred on the invalid relationship.
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Receive Buffer Size** | 65507 B | | The size of each buffer used to 
receive Syslog messages. Adjust this value appropriately based on the expected 
size of the incoming Syslog messages. When UDP is selected each buffer will 
hold one Syslog message. When TCP is selected messages are read from an 
incoming connection until the buffer is full, or the connection is closed. |
+| **Max Size of Socket Buffer** | 1 MB | | The maximum size of the socket 
buffer that should be used. This is a suggestion to the Operating System to 
indicate how big the socket buffer should be. If this value is set too low, the 
buffer may fill up before the data can be read, and incoming data will be 
dropped. |
+| **Max Number of TCP Connections** | 2 | | The maximum number of concurrent 
connections to accept Syslog messages in TCP mode. |
+| **Max Batch Size** | 1 | | The maximum number of Syslog events to add to a 
single FlowFile. If multiple events are available, they will be concatenated 
along with the \<Message Delimiter\> up to this configured maximum number of 
messages|
+| **Message Delimiter"** | \n | | Specifies the delimiter to place between 
Syslog messages when multiple messages are bundled together (see \<Max Batch 
Size\> property). |
+| **Parse Messages** | true | true, false | Indicates if the processor should 
parse the Syslog messages. If set to false, each outgoing FlowFile will only 
contain the sender, protocol, and port, and no additional attributes. |
+| **Port** | 514 | | The port for Syslog communication.|
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | Syslog messages that match one of the expected formats will be 
sent out this relationship as a FlowFile per message. |
+| invalid | Syslog messages that do not match one of the expected formats will 
be sent out this relationship as a FlowFile per message. |
+
+## PutFile
+
+### Description
+
+Writes the contents of a FlowFile to the local file system
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Directory** | | | The directory to which files should be written. You may 
use expression language such as `/aa/bb/${path}`<br>**Supports Expression 
Language: true** |
+| **Conflict Resolution Strategy** | fail | replace, ignore, fail | Indicates 
what should happen when a file with the same name already exists in the output 
directory |
+| **Create Missing Directories** | true | true, false | If true, then missing 
destination directories will be created. If false, flowfiles are penalized and 
sent to failure. |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | Files that have been successfully written to the output directory 
are transferred to this relationship |
+| failure | Files that could not be written to the output directory for some 
reason are transferred to this relationship |
+
+## TailFile
+
+### Description
+
+"Tails" a file, or a list of files, ingesting data from the file as it is
+written to the file. The file is expected to be textual. Data is ingested only
+when a new line is encountered (carriage return or new-line character or
+combination). If the file to tail is periodically "rolled over", as is
+generally the case with log files, an optional Rolling Filename Pattern can be
+used to retrieve data from files that have rolled over, even if the rollover
+occurred while NiFi was not running (provided that the data still exists upon
+restart of NiFi). It is generally advisable to set the Run Schedule to a few
+seconds, rather than running with the default value of 0 secs, as this
+Processor will consume a lot of resources if scheduled very aggressively. At
+this time, this Processor does not support ingesting files that have been
+compressed when 'rolled over'.
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **File to Tail** | | | Fully-qualified filename of the file that should be 
tailed |
+| **State File** | TailFileState | | Specifies the file that should be used 
for storing state about what data has been ingested so that upon restart NiFi 
can resume from where it left off |
+| Input Delimiter | | | Specifies the character that should be used for 
delimiting the data being tailed from the incoming file. |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | All FlowFiles are routed to this Relationship. |
+
+## MergeContent
+
+Merges a Group of FlowFiles together based on a user-defined strategy and
+packages them into a single FlowFile. It is recommended that the Processor be
+configured with only a single incoming connection, as Group of FlowFiles will
+not be created from FlowFiles in different connections. This processor updates
+the mime.type attribute as appropriate.
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Merge Strategy** | Defragment | Defragment, Bin-Packing Algorithm | |
+| **Merge Format** | Binary Concatenation | TAR, ZIP, "FlowFile Stream, v3", 
"FlowFile Stream, v2", "FlowFile Tar, v1", Binary Concatenation, Avro | 
Determines the format that will be used to merge the content. |
+| Correlation Attribute Name | | | Correlation Attribute Name |
+| **Delimiter Strategy** | Filename | Filename, Text | Determines if Header, 
Footer, and Demarcator should point to files containing the respective content, 
or if the values of the properties should be used as the content. |
+| Header File | | | Filename specifying the header to use |
+| Footer File | | | Filename specifying the footer to use |
+| Demarcator File | | | Filename specifying the demarcator to use |
+| **Keep Path** | false | false, true | If using the Zip or Tar Merge Format, 
specifies whether or not the FlowFiles' paths should be included in their entry 
|
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| original | The FlowFiles that were used to create the bundle |
+| merged | The FlowFile containing the merged content |
+| failure | If the bundle cannot be created, all FlowFiles that would have 
been used to created the bundle will be transferred to failure |
+
+## ExtractText
+
+Extracts the content of a FlowFile and places it into an attribute.
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Attribute** | | | Attribute to set from content |
+| **Size Limit** | 2 MB | | Maximum number of bytes to read into the 
attribute. 0 for no limit. |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | All FlowFiles are routed to this Relationship. |
+
+## CompressContent
+
+Compresses or decompresses the contents of FlowFiles using a user-specified
+compression algorithm and updates the mime.type attribute as appropriate
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Compression Level** | 1 | 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 | The compression 
level to use; this is valid only when using GZIP compression. |
+| **Mode** | compress | compress, decompress | Indicates whether the processor 
should compress content or decompress content. |
+| **Compression Format** | use mime.type attribute | gzip, bzip2, xz-lzma2, 
lzma, snappy, snappy framed | The compression format to use. |
+| **Update Filename** | false | true, false | If true, will remove the 
filename extension when decompressing data (only if the extension indicates the 
appropriate compression format) and add the appropriate extension when 
compressing data |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | FlowFiles will be transferred to the success relationship after 
successfully being compressed or decompressed |
+| failure | FlowFiles will be transferred to the failure relationship if they 
fail to compress/decompress |
+
+## FocusArchiveEntry
+
+Allows manipulation of entries within an archive (e.g. TAR) by focusing on one
+entry within the archive at a time. When an archive entry is focused, that
+entry is treated as the content of the FlowFile and may be manipulated
+independently of the rest of the archive. To restore the FlowFile to its
+original state, use UnfocusArchiveEntry.
+
+Archives may be compressed.
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Path** |  | | The path of the archive entry to focus |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | The FlowFile, with the given entry focused, is sent to this 
relationship |
+
+## UnfocusArchiveEntry
+
+Restores a FlowFile which has had an archive entry focused via
+FocusArchiveEntry to its original state.
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | All FlowFiles are routed to this Relationship |
+
+## ManipulateArchive
+
+Performs an operation which manipulates an archive without needing to split the
+archive into multiple FlowFiles.
+
+### Operations
+
+| Name | Description |
+| - | - |
+| `remove` | Removes the target |
+| `copy` | Copies the target to a new archive entry |
+| `move` | Moves the target to a new archive entry |
+| `touch` | Creates a new archive entry |
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Operation** | | touch, remove, copy, move | Operation to perform on the 
archive |
+| Target | | | An existing entry within the archive to perform the operation 
on |
+| Destination | | | Destination for operations (touch, move or copy) which 
result in new entries. |
+| Before | | | For operations which result in new entries, places the new 
entry before the entry specified by this property |
+| After | | | For operations which result in new entries, places the new entry 
after the entry specified by this property |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | FlowFiles will be transferred to the success relationship if the 
operation succeeds. |
+| failure | FlowFiles will be transferred to the failure relationship if the 
operation fails. |
+
+## PublishKafka 
+
+This Processor puts the contents of a FlowFile to a Topic in Apache Kafka. The
+content of a FlowFile becomes the contents of a Kafka message. This message is
+optionally assigned a key by using the \<Kafka Key\> Property.
+
+### Properties
+
+In the list below, the names of required properties appear in bold. Any other
+properties (not in bold) are considered optional. The table also indicates any
+default values, and whether a property supports the NiFi Expression Language.
+
+| Name | Default Value | Allowable Values | Description |
+| - | - | - | - |
+| **Known Brokers** | | | A comma-separated list of known Kafka Brokers in the 
format \<host\>:\<port\> |
+| **Topic Name** | | | The Kafka Topic of interest |
+| **Delivery Guarantee** | 1 | all, 1, 0 | Specifies the requirement for 
guaranteeing that a message is sent to Kafka |
+| Request Timeout | | | The ack timeout of the producer request in 
milliseconds |
+| Client Name | | | Client Name to use when communicating with Kafka |
+| Batch Size | | | Maximum number of messages batched in one MessageSet |
+| Queue Buffering Max Time | | | Delay to wait for messages in the producer 
queue to accumulate before constructing message batches |
+| Queue Max Buffer Size | | | Maximum total message size sum allowed on the 
producer queue |
+| Queue Max Message | | | Maximum number of messages allowed on the producer 
queue |
+| Compress Codec | none | none, gzip, snappy | compression codec to use for 
compressing message sets |
+| Max Flow Segment Size | | | Maximum flow content payload segment size for 
the kafka record |
+| Security Protocol | | plaintext, ssl, sasl\_plaintext, sasl\_ssl | Protocol 
used to communicate with brokers |
+| Security Cert | | | Path to client's public key (PEM) used for 
authentication |
+| Security Private Key | | | Path to client's private key (PEM) used for 
authentication |
+| Security Pass Phrase | | | Private key passphrase |
+
+### Relationships
+
+| Name | Description |
+| - | - |
+| success | Any FlowFile that is successfully sent to Kafka will be routed to 
this Relationship |
+| failure | Any FlowFile that cannot be sent to Kafka will be routed to this 
Relationship |
+

http://git-wip-us.apache.org/repos/asf/nifi-minifi-cpp/blob/c719b40a/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index 716f9c6..9993832 100644
--- a/README.md
+++ b/README.md
@@ -41,29 +41,33 @@ Specific goals for MiNiFi are comprised of:
 
 Perspectives of the role of MiNiFi should be from the perspective of the agent 
acting immediately at, or directly adjacent to, source sensors, systems, or 
servers.
 
+### Processors
+
+MiNiFi - C++ supports the following processors:
+
+* [AppendHostInfo](PROCESSORS.md#appendhostinfo)
+* [ExecuteProcess](PROCESSORS.md#executeprocess)
+* [ExecuteScript](PROCESSORS.md#executescript)
+* [GetFile](PROCESSORS.md#getfile)
+* [GetUSBCamera](PROCESSORS.md#getusbcamera)
+* [GenerateFlowFile](PROCESSORS.md#generateflowfile)
+* [InvokeHTTP](PROCESSORS.md#invokehttp)
+* [LogAttribute](PROCESSORS.md#logattribute)
+* [ListenHTTP](PROCESSORS.md#listenhttp)
+* [ListenSyslog](PROCESSORS.md#listensyslog)
+* [PutFile](PROCESSORS.md#putfile)
+* [TailFile](PROCESSORS.md#tailfile)
+* [MergeContent](PROCESSORS.md#mergecontent)
+* [ExtractText](PROCESSORS.md#extracttext)
+* [CompressContent](PROCESSORS.md#compresscontent)
+* [FocusArchiveEntry](PROCESSORS.md#focusarchiveentry)
+* [UnfocusArchiveEntry](PROCESSORS.md#unfocusarchiveentry)
+* [ManipulateArchive](PROCESSORS.md#manipulatearchive)
+* [PublishKafka](PROCESSORS.md#publishkafka)
+
 ## Caveats
 * 0.4.0 represents a non-GA release, APIs and interfaces are subject to change
 * Build and usage currently only supports Linux and OS X environments. MiNiFi 
C++ can be built and run through the Windows Subsystem for Linux but we provide 
no support for this platform.
-* The processors currently implemented include:
-  * AppendHostInfo
-  * ExecuteProcess
-  * ExecuteScript
-  * GetFile
-  * GetUSBCamera
-  * GenerateFlowFile
-  * InvokeHTTP
-  * LogAttribute
-  * ListenHTTP
-  * ListenSyslog
-  * PutFile
-  * TailFile
-  * MergeContent
-  * ExtractText
-  * CompressContent
-  * FocusArchive
-  * UnfocusArchive
-  * ManipulateArchive
-  * PublishKafka 
 * Provenance events generation is supported and are persisted using RocksDB. 
Volatile repositories can be used on systems without persistent storage.
 
 ## System Requirements

Reply via email to