Github user andrewmlim commented on a diff in the pull request:
https://github.com/apache/nifi/pull/1713#discussion_r114130948
--- Diff: nifi-docs/src/main/asciidoc/user-guide.adoc ---
@@ -1799,6 +1799,70 @@ Once "Expand" is selected, the graph is re-drawn to
show the children and their
image:expanded-events.png["Expanded Events"]
+[[encrypted-provenance]]
+=== Encrypted Provenance Repository
+While OS-level access control can offer some security over the provenance
data written to the disk in a repository, there are scenarios where the data
may be sensitive, compliance and regulatory requirements, or NiFi is running on
hardware not under the direct control of the organization (cloud, etc.). In
this case, the provenance repository allows for all data to be encrypted before
being persisted to the disk.
+
+[WARNING]
+.Performance
+============
+The current implementation of the encrypted provenance repository
intercepts the record writer and reader of `WriteAheadProvenanceRepository`,
which offers significant performance improvements over the legacy
`PersistentProvenanceRepository` and uses the `AES/GCM` algorithm, which is
fairly performant on commodity hardware. In most scenarios, the added cost will
not be significant (unnoticable on a flow with hundreds of provenance events
per second, moderately noticable on a flow with thousands - tens of thousands
of events per second). However, administrators should perform their own risk
assessment and performance analysis and decide how to move forward. Switching
back and forth between encrypted/unencrypted implementations is not recommended
at this time.
+============
+
+==== What is it?
+
+The `EncryptedWriteAheadProvenanceRepository` is a new implementation of
the provenance repository which encrypts all event record information before it
is written to the repository. This allows for storage on systems where OS-level
access controls are not sufficient to protect the data while still allowing
querying and access to the data through the NiFi UI/API.
+
+==== How does it work?
+
+The `WriteAheadProvenanceRepository` was introduced in NiFi 1.2.0 and
provided a refactored and much faster provenance repository implementation than
the previous `PersistentProvenanceRepository`. The encrypted version wraps that
implementation with a record writer and reader which encrypt and decrypt the
serialized bytes respectively.
+
+The fully qualified class
`org.apache.nifi.provenance.EncryptedWriteAheadProvenanceRepository` is
specified as the provenance repository implementation in `nifi.properties` as
the value of `nifi.provenance.repository.implementation`. In addition,
<<administration-guide.adoc#encrypted-write-ahead-provenance-repository-properties,new
properties>> must be populated to allow successful initialization.
+
+===== StaticKeyProvider
+The `StaticKeyProvider` implementation defines keys directly in
`nifi.properties`. Individual keys are provided in hexadecimal encoding (can
also be encrypted like any other sensitive property in `nifi.properties` using
the <<administration-guide.adoc#encrypt-config_tool,`./encrypt-config.sh`>>
tool in the NiFi Toolkit).
+
+The following configuration section would result in a key provider with
two available keys, "Key1" (active) and "AnotherKey".
+....
+nifi.provenance.repository.encryption.key.provider.implementation=org.apache.nifi.provenance.StaticKeyProvider
+nifi.provenance.repository.encryption.key.id=Key1
+nifi.provenance.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
+nifi.provenance.repository.encryption.key.id.AnotherKey=0101010101010101010101010101010101010101010101010101010101010101
+....
+
+===== FileBasedKeyProvider
+The `FileBasedKeyProvider` implementation reads from an encrypted
definition file of the format:
+
+....
+key1=NGCpDpxBZNN0DBodz0p1SDbTjC2FG5kp1pCmdUKJlxxtcMSo6GC4fMlTyy1mPeKOxzLut3DRX+51j6PCO5SznA==
+key2=GYxPbMMDbnraXs09eGJudAM5jTvVYp05XtImkAg4JY4rIbmHOiVUUI6OeOf7ZW+hH42jtPgNW9pSkkQ9HWY/vQ==
+key3=SFe11xuz7J89Y/IQ7YbJPOL0/YKZRFL/VUxJgEHxxlXpd/8ELA7wwN59K1KTr3BURCcFP5YGmwrSKfr4OE4Vlg==
+key4=kZprfcTSTH69UuOU3jMkZfrtiVR/eqWmmbdku3bQcUJ/+UToecNB5lzOVEMBChyEXppyXXC35Wa6GEXFK6PMKw==
+key5=c6FzfnKm7UR7xqI2NFpZ+fEKBfSU7+1NvRw+XWQ9U39MONWqk5gvoyOCdFR1kUgeg46jrN5dGXk13sRqE0GETQ==
+....
+
+Each line defines a key ID and then the Base64-encoded cipher text of a 16
byte IV and wrapped AES-128, AES-192, or AES-256 key depending on the JCE
policies available. The individual keys are wrapped by AES/GCM encryption using
the **master key** defined by `nifi.bootstrap.sensitive.key` in
`conf/bootstrap.conf`.
+
+===== Key Rotation
+Simply update `nifi.properties` to reference a new key ID in
`nifi.provenance.repository.encryption.key.id`. Previously-encrypted events can
still be decrypted as long as that key is still available in the key definition
file or `nifi.provenance.repository.encryption.key.id.<OldKeyID>` as the key ID
is serialized alongside the encrypted record.
+
+==== Writing and Reading Event Records
+Once the repository is initialized, all provenance event record write
operations are serialized according to the configured schema writer
(`EventIdFirstSchemaRecordWriter` by default for
`WriteAheadProvenanceRepository`) to a `byte[]`. Those bytes are then encrypted
using an implementation of `ProvenanceEventEncryptor` (the only current
implementation is `AES/GCM/NoPadding`) and the encryption metadata (`keyId`,
`algorithm`, `version`, `IV`) is serialized and prepended. The complete
`byte[]` is then written to the repository on disk as normal.
+
+image:encrypted-wapr-hex.png["Encrypted provenance repository file on
disk"]
+
+On record read, the process is reversed. The encryption metadata is parsed
and used to decrypt the serialized bytes, which are then deserialized into a
`ProvenanceEventRecord` object. The delegation to the normal schema record
writer/reader allows for "random-access" (i.e. immediate seek without
decryption of unnecessary records).
+
+Within the NiFi UI/API, there is no detectable difference between an
encrypted and unencrypted provenance repository. The Provenance Query
operations work as expected with no change to the process.
+
+==== Potential Issues
+* Switching between unencrypted and encrypted repositories
+** If a user has an existing repository that is not encrypted and switches
their configuration to use an encrypted repository, the application writes an
error to the log but starts up. However, previous events are not accessible
through the provenance query interface and new events will overwrite the
existing events. The same behavior occurs if a user switches from an encrypted
repository to an unencrypted repository. Automatic roll-over is a future effort
(https://issues.apache.org/jira/browse/NIFI-3722[NIFI-3722]) but NiFi is not
intended to be long-term storage for provenance events so the impact should be
minimal.
--- End diff --
Change to: Automatic roll-over is a future effort
(https://issues.apache.org/jira/browse/NIFI-3722[NIFI-3722]) but NiFi is not
intended for long-term storage of provenance events so the impact should be
minimal.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---