Repository: nifi
Updated Branches:
  refs/heads/master 60d88b5a6 -> 946f4a1a2


NIFI-3721 Added documentation for Encrypted Provenance Repositories to Admin 
Guide and User Guide.
Added screenshot of encrypted provenance repository contents on disk.
Added note about clearing existing provenance repository when switching to 
encrypted implementation (see PR 1686 @ 
https://github.com/apache/nifi/pull/1686#issuecomment-298432578).

This closes #1713.

Signed-off-by: Andy LoPresto <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/nifi/repo
Commit: http://git-wip-us.apache.org/repos/asf/nifi/commit/946f4a1a
Tree: http://git-wip-us.apache.org/repos/asf/nifi/tree/946f4a1a
Diff: http://git-wip-us.apache.org/repos/asf/nifi/diff/946f4a1a

Branch: refs/heads/master
Commit: 946f4a1a28454763ede76553a554ef62197fdc51
Parents: 60d88b5
Author: Andy LoPresto <[email protected]>
Authored: Thu Apr 27 16:06:57 2017 -0700
Committer: Andy LoPresto <[email protected]>
Committed: Mon May 1 17:19:25 2017 -0400

----------------------------------------------------------------------
 .../src/main/asciidoc/administration-guide.adoc |  51 +++++++++----
 .../main/asciidoc/images/encrypted-wapr-hex.png | Bin 0 -> 660856 bytes
 nifi-docs/src/main/asciidoc/user-guide.adoc     |  71 +++++++++++++++++++
 3 files changed, 110 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/nifi/blob/946f4a1a/nifi-docs/src/main/asciidoc/administration-guide.adoc
----------------------------------------------------------------------
diff --git a/nifi-docs/src/main/asciidoc/administration-guide.adoc 
b/nifi-docs/src/main/asciidoc/administration-guide.adoc
index a07bc25..09af983 100644
--- a/nifi-docs/src/main/asciidoc/administration-guide.adoc
+++ b/nifi-docs/src/main/asciidoc/administration-guide.adoc
@@ -2102,28 +2102,29 @@ this property specifies the maximum amount of time to 
keep the archived data. It
 
 === Provenance Repository
 
-The Provenance Repository contains the information related to Data Provenance. 
The next three sections are for Provenance Repository properties.
+The Provenance Repository contains the information related to Data Provenance. 
The next four sections are for Provenance Repository properties.
 
 |====
 |*Property*|*Description*
-|nifi.provenance.repository.implementation|The Provenance Repository 
implementation. The default value is 
org.apache.nifi.provenance.PersistentProvenanceRepository.
+|nifi.provenance.repository.implementation|The Provenance Repository 
implementation. The default value is 
`org.apache.nifi.provenance.PersistentProvenanceRepository`.
 Two additional repositories are available as well.
 To store provenance events in memory instead of on disk (in which case all 
events will be lost on restart, and events will be evicted in a 
first-in-first-out order),
-set this property to org.apache.nifi.provenance.VolatileProvenanceRepository. 
This leaves a configurable number of Provenance Events in the Java heap, so the 
number
+set this property to 
`org.apache.nifi.provenance.VolatileProvenanceRepository`. This leaves a 
configurable number of Provenance Events in the Java heap, so the number
 of events that can be retained is very limited.
 
-As of Apache NiFi 1.2.0, a third option is available: 
org.apache.nifi.provenance.WriteAheadProvenanceRepository.
-This implementation was created to replace the PersistentProvenanceRepository. 
The PersistentProvenanceRepository was originally written with the simple goal 
of persisting
+As of Apache NiFi 1.2.0, a third and fourth option are available: 
`org.apache.nifi.provenance.WriteAheadProvenanceRepository` and 
`org.apache.nifi.provenance.EncryptedWriteAheadProvenanceRepository`.
+This implementation was created to replace the 
`PersistentProvenanceRepository`. The `PersistentProvenanceRepository` was 
originally written with the simple goal of persisting
 Provenance Events as they are generated and providing the ability to iterate 
over those events sequentially. Later, it was desired to be able to compress 
the data so that
 more data could be stored. After that, the ability to index and query the data 
was added. As requirements evolved over time, the repository kept changing 
without any major
-redesigns. When used in a NiFi instance that is responsible for processing 
large volumes of small FlowFiles, the PersistentProvenanceRepository can 
quickly become a bottleneck.
-The WriteAheadProvenanceRepository was then written to provide the same 
capabilities as the PersistentProvenanceRepository while providing far better 
performance.
-Changing to the WriteAheadProvenanceRepository is easy to accomplish, as the 
two repositories support most of the same properties.
-*Note Well*, however, the follow caveat: The WriteAheadProvenanceRepository 
will make use of the Provenance data stored by the 
PersistentProvenanceRepository. However, the
-PersistentProvenanceRepository may not be able to read the data written by the 
WriteAheadProvenanceRepository. Therefore, once the Provenance Repository is 
changed to use
-the WriteAheadProvenanceRepository, it cannot be changed back to the 
PersistentProvenanceRepository without deleting the data in the Provenance 
Repository. It is therefore
+redesigns. When used in a NiFi instance that is responsible for processing 
large volumes of small FlowFiles, the `PersistentProvenanceRepository` can 
quickly become a bottleneck.
+The `WriteAheadProvenanceRepository` was then written to provide the same 
capabilities as the `PersistentProvenanceRepository` while providing far better 
performance.
+Changing to the `WriteAheadProvenanceRepository` is easy to accomplish, as the 
two repositories support most of the same properties.
+
+*Note Well*, however, the following caveat: The 
`WriteAheadProvenanceRepository` will make use of the Provenance data stored by 
the `PersistentProvenanceRepository`. However, the
+`PersistentProvenanceRepository` may not be able to read the data written by 
the `WriteAheadProvenanceRepository`. Therefore, once the Provenance Repository 
is changed to use
+the `WriteAheadProvenanceRepository`, it cannot be changed back to the 
`PersistentProvenanceRepository` without deleting the data in the Provenance 
Repository. It is therefore
 recommended that before changing the implementation, users ensure that their 
version of NiFi is stable, in case any issue arises that causes the user to 
need to roll back to
-a previous version of NiFi that did not support the 
WriteAheadProvenanceRepository. It is for this reason that the default is still 
set to the PersistentProvenanceRepository
+a previous version of NiFi that did not support the 
`WriteAheadProvenanceRepository`. It is for this reason that the default is 
still set to the `PersistentProvenanceRepository`
 at this time.
 |====
 
@@ -2229,6 +2230,32 @@ Providing three total locations, including  
_nifi.provenance.repository.director
        are not fully utilized, this feature can result in far faster 
Provenance queries.
 |====
 
+[[encrypted-write-ahead-provenance-repository-properties]]
+=== Encrypted Write Ahead Provenance Repository Properties
+
+All of the properties defined above (see 
<<write-ahead-provenance-repository-properties,Write Ahead Repository 
Properties>>) still apply. Only encryption-specific properties are listed here. 
See <<user-guide.adoc#encrypted-provenance,Encrypted Provenance Repository in 
the User Guide>> for more information.
+
+|====
+|*Property*|*Description*
+|nifi.provenance.repository.debug.frequency|Controls the number of events 
processed between DEBUG statements documenting the performance metrics of the 
repository. This value is only used when DEBUG level statements are enabled in 
the log configuration.
+ |nifi.provenance.repository.encryption.key.provider.implementation|This is 
the fully-qualified class name of the **key provider**. A key provider is the 
datastore interface for accessing the encryption key to protect the provenance 
events. There are currently two implementations -- `StaticKeyProvider` which 
reads a key directly from `nifi.properties`, and `FileBasedKeyProvider` which 
reads *n* many keys from an encrypted file. The interface is extensible, and 
HSM-backed or other providers are expected in the future.
+ |nifi.provenance.repository.encryption.key.provider.location|The path to the 
key definition resource (empty for `StaticKeyProvider`, `./keys.nkp` or similar 
path for `FileBasedKeyProvider`). For future providers like an HSM, this may be 
a connection string or URL.
+ |nifi.provenance.repository.encryption.key.id|The active key ID to use for 
encryption (e.g. `Key1`).
+ |nifi.provenance.repository.encryption.key|The key to use for 
`StaticKeyProvider`. The key format is hex-encoded 
(`0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210`) but can 
also be encrypted using the <<encrypt-config_tool,`./encrypt-config.sh`>> tool 
in NiFi Toolkit.
+ |nifi.provenance.repository.encryption.key.id.*|Allows for additional keys to 
be specified for the `StaticKeyProvider`. For example, the line 
`nifi.provenance.repository.encryption.key.id.Key2=012...210` would provide an 
available key `Key2`.
+|====
+
+The simplest configuration is below:
+
+....
+nifi.provenance.repository.implementation=org.apache.nifi.provenance.EncryptedWriteAheadProvenanceRepository
+nifi.provenance.repository.debug.frequency=100
+nifi.provenance.repository.encryption.key.provider.implementation=org.apache.nifi.provenance.StaticKeyProvider
+nifi.provenance.repository.encryption.key.provider.location=
+nifi.provenance.repository.encryption.key.id=Key1
+nifi.provenance.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
+....
+
 
 === Component Status Repository
 

http://git-wip-us.apache.org/repos/asf/nifi/blob/946f4a1a/nifi-docs/src/main/asciidoc/images/encrypted-wapr-hex.png
----------------------------------------------------------------------
diff --git a/nifi-docs/src/main/asciidoc/images/encrypted-wapr-hex.png 
b/nifi-docs/src/main/asciidoc/images/encrypted-wapr-hex.png
new file mode 100644
index 0000000..46dc54d
Binary files /dev/null and 
b/nifi-docs/src/main/asciidoc/images/encrypted-wapr-hex.png differ

http://git-wip-us.apache.org/repos/asf/nifi/blob/946f4a1a/nifi-docs/src/main/asciidoc/user-guide.adoc
----------------------------------------------------------------------
diff --git a/nifi-docs/src/main/asciidoc/user-guide.adoc 
b/nifi-docs/src/main/asciidoc/user-guide.adoc
index 60bd9ea..19eb001 100644
--- a/nifi-docs/src/main/asciidoc/user-guide.adoc
+++ b/nifi-docs/src/main/asciidoc/user-guide.adoc
@@ -1808,6 +1808,77 @@ Once "Expand" is selected, the graph is re-drawn to show 
the children and their
 
 image:expanded-events.png["Expanded Events"]
 
+[[encrypted-provenance]]
+=== Encrypted Provenance Repository
+While OS-level access control can offer some security over the provenance data 
written to the disk in a repository, there are scenarios where the data may be 
sensitive, compliance and regulatory requirements exist, or NiFi is running on 
hardware not under the direct control of the organization (cloud, etc.). In 
this case, the provenance repository allows for all data to be encrypted before 
being persisted to the disk.
+
+[WARNING]
+.Performance
+============
+The current implementation of the encrypted provenance repository intercepts 
the record writer and reader of `WriteAheadProvenanceRepository`, which offers 
significant performance improvements over the legacy 
`PersistentProvenanceRepository` and uses the `AES/GCM` algorithm, which is 
fairly performant on commodity hardware. In most scenarios, the added cost will 
not be significant (unnoticable on a flow with hundreds of provenance events 
per second, moderately noticable on a flow with thousands - tens of thousands 
of events per second). However, administrators should perform their own risk 
assessment and performance analysis and decide how to move forward. Switching 
back and forth between encrypted/unencrypted implementations is not recommended 
at this time.
+============
+
+==== What is it?
+
+The `EncryptedWriteAheadProvenanceRepository` is a new implementation of the 
provenance repository which encrypts all event record information before it is 
written to the repository. This allows for storage on systems where OS-level 
access controls are not sufficient to protect the data while still allowing 
querying and access to the data through the NiFi UI/API.
+
+==== How does it work?
+
+The `WriteAheadProvenanceRepository` was introduced in NiFi 1.2.0 and provided 
a refactored and much faster provenance repository implementation than the 
previous `PersistentProvenanceRepository`. The encrypted version wraps that 
implementation with a record writer and reader which encrypt and decrypt the 
serialized bytes respectively.
+
+The fully qualified class 
`org.apache.nifi.provenance.EncryptedWriteAheadProvenanceRepository` is 
specified as the provenance repository implementation in `nifi.properties` as 
the value of `nifi.provenance.repository.implementation`. In addition, 
<<administration-guide.adoc#encrypted-write-ahead-provenance-repository-properties,new
 properties>> must be populated to allow successful initialization.
+
+===== StaticKeyProvider
+The `StaticKeyProvider` implementation defines keys directly in 
`nifi.properties`. Individual keys are provided in hexadecimal encoding. The 
keys can also be encrypted like any other sensitive property in 
`nifi.properties` using the 
<<administration-guide.adoc#encrypt-config_tool,`./encrypt-config.sh`>> tool in 
the NiFi Toolkit.
+
+The following configuration section would result in a key provider with two 
available keys, "Key1" (active) and "AnotherKey".
+....
+nifi.provenance.repository.encryption.key.provider.implementation=org.apache.nifi.provenance.StaticKeyProvider
+nifi.provenance.repository.encryption.key.id=Key1
+nifi.provenance.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
+nifi.provenance.repository.encryption.key.id.AnotherKey=0101010101010101010101010101010101010101010101010101010101010101
+....
+
+===== FileBasedKeyProvider
+The `FileBasedKeyProvider` implementation reads from an encrypted definition 
file of the format:
+
+....
+key1=NGCpDpxBZNN0DBodz0p1SDbTjC2FG5kp1pCmdUKJlxxtcMSo6GC4fMlTyy1mPeKOxzLut3DRX+51j6PCO5SznA==
+key2=GYxPbMMDbnraXs09eGJudAM5jTvVYp05XtImkAg4JY4rIbmHOiVUUI6OeOf7ZW+hH42jtPgNW9pSkkQ9HWY/vQ==
+key3=SFe11xuz7J89Y/IQ7YbJPOL0/YKZRFL/VUxJgEHxxlXpd/8ELA7wwN59K1KTr3BURCcFP5YGmwrSKfr4OE4Vlg==
+key4=kZprfcTSTH69UuOU3jMkZfrtiVR/eqWmmbdku3bQcUJ/+UToecNB5lzOVEMBChyEXppyXXC35Wa6GEXFK6PMKw==
+key5=c6FzfnKm7UR7xqI2NFpZ+fEKBfSU7+1NvRw+XWQ9U39MONWqk5gvoyOCdFR1kUgeg46jrN5dGXk13sRqE0GETQ==
+....
+
+Each line defines a key ID and then the Base64-encoded cipher text of a 16 
byte IV and wrapped AES-128, AES-192, or AES-256 key depending on the JCE 
policies available. The individual keys are wrapped by AES/GCM encryption using 
the **master key** defined by `nifi.bootstrap.sensitive.key` in 
`conf/bootstrap.conf`.
+
+===== Key Rotation
+Simply update `nifi.properties` to reference a new key ID in 
`nifi.provenance.repository.encryption.key.id`. Previously-encrypted events can 
still be decrypted as long as that key is still available in the key definition 
file or `nifi.provenance.repository.encryption.key.id.<OldKeyID>` as the key ID 
is serialized alongside the encrypted record.
+
+==== Writing and Reading Event Records
+Once the repository is initialized, all provenance event record write 
operations are serialized according to the configured schema writer 
(`EventIdFirstSchemaRecordWriter` by default for 
`WriteAheadProvenanceRepository`) to a `byte[]`. Those bytes are then encrypted 
using an implementation of `ProvenanceEventEncryptor` (the only current 
implementation is `AES/GCM/NoPadding`) and the encryption metadata (`keyId`, 
`algorithm`, `version`, `IV`) is serialized and prepended. The complete 
`byte[]` is then written to the repository on disk as normal.
+
+image:encrypted-wapr-hex.png["Encrypted provenance repository file on disk"]
+
+On record read, the process is reversed. The encryption metadata is parsed and 
used to decrypt the serialized bytes, which are then deserialized into a 
`ProvenanceEventRecord` object. The delegation to the normal schema record 
writer/reader allows for "random-access" (i.e. immediate seek without 
decryption of unnecessary records).
+
+Within the NiFi UI/API, there is no detectable difference between an encrypted 
and unencrypted provenance repository. The Provenance Query operations work as 
expected with no change to the process.
+
+==== Potential Issues
+
+[WARNING]
+.Switching Implementations
+============
+When switching between implementation "families" (i.e. 
`VolatileProvenanceRepository` or `PersistentProvenanceRepository` to 
`EncryptedWriteAheadProvenanceRepository`), the existing repository must be 
cleared from the file system before starting NiFi. A terminal command like 
`localhost:$NIFI_HOME $ rm -rf provenance_repository/` is sufficient.
+============
+
+* Switching between unencrypted and encrypted repositories
+** If a user has an existing repository (`WriteAheadProvenanceRepository` only 
-- **not** `PersistentProvenanceRepository`) that is not encrypted and switches 
their configuration to use an encrypted repository, the application writes an 
error to the log but starts up. However, previous events are not accessible 
through the provenance query interface and new events will overwrite the 
existing events. The same behavior occurs if a user switches from an encrypted 
repository to an unencrypted repository. Automatic roll-over is a future effort 
(https://issues.apache.org/jira/browse/NIFI-3722[NIFI-3722]) but NiFi is not 
intended for long-term storage of provenance events so the impact should be 
minimal. There are two scenarios for roll-over:
+*** Encrypted -> unencrypted -- if the previous repository implementation was 
encrypted, these events should be handled seamlessly as long as the key 
provider available still has the keys used to encrypt the events (see **Key 
Rotation**)
+*** Unencrypted -> encrypted -- if the previous repository implementation was 
unencrypted, these events should be handled seamlessly as the previously 
recorded events simply need to be read with a plaintext schema record reader 
and then written back with the encrypted record writer
+** There is also a future effort to provide a standalone tool in NiFi Toolkit 
to encrypt/decrypt an existing provenance repository to make the transition 
easier. The translation process could take a long time depending on the size of 
the existing repository, and being able to perform this task outside of 
application startup would be valuable 
(https://issues.apache.org/jira/browse/NIFI-3723[NIFI-3723]).
+* Multiple repositories -- No additional effort or testing has been applied to 
multiple repositories at this time. It is possible/likely issues will occur 
with repositories on different physical devices. There is no option to provide 
a heterogenous environment (i.e. one encrypted, one plaintext repository).
+* Corruption -- when a disk is filled or corrupted, there have been reported 
issues with the repository becoming corrupted and recovery steps are necessary. 
This is likely to continue to be an issue with the encrypted repository, 
although still limited in scope to individual records (i.e. an entire 
repository file won't be irrecoverable due to the encryption).
 
 [[other_management_features]]
 Other Management Features

Reply via email to