Github user andrewmlim commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1713#discussion_r114132293
  
    --- Diff: nifi-docs/src/main/asciidoc/user-guide.adoc ---
    @@ -1799,6 +1799,70 @@ Once "Expand" is selected, the graph is re-drawn to 
show the children and their
     
     image:expanded-events.png["Expanded Events"]
     
    +[[encrypted-provenance]]
    +=== Encrypted Provenance Repository
    +While OS-level access control can offer some security over the provenance 
data written to the disk in a repository, there are scenarios where the data 
may be sensitive, compliance and regulatory requirements, or NiFi is running on 
hardware not under the direct control of the organization (cloud, etc.). In 
this case, the provenance repository allows for all data to be encrypted before 
being persisted to the disk.
    +
    +[WARNING]
    +.Performance
    +============
    +The current implementation of the encrypted provenance repository 
intercepts the record writer and reader of `WriteAheadProvenanceRepository`, 
which offers significant performance improvements over the legacy 
`PersistentProvenanceRepository` and uses the `AES/GCM` algorithm, which is 
fairly performant on commodity hardware. In most scenarios, the added cost will 
not be significant (unnoticable on a flow with hundreds of provenance events 
per second, moderately noticable on a flow with thousands - tens of thousands 
of events per second). However, administrators should perform their own risk 
assessment and performance analysis and decide how to move forward. Switching 
back and forth between encrypted/unencrypted implementations is not recommended 
at this time.
    +============
    +
    +==== What is it?
    +
    +The `EncryptedWriteAheadProvenanceRepository` is a new implementation of 
the provenance repository which encrypts all event record information before it 
is written to the repository. This allows for storage on systems where OS-level 
access controls are not sufficient to protect the data while still allowing 
querying and access to the data through the NiFi UI/API.
    +
    +==== How does it work?
    +
    +The `WriteAheadProvenanceRepository` was introduced in NiFi 1.2.0 and 
provided a refactored and much faster provenance repository implementation than 
the previous `PersistentProvenanceRepository`. The encrypted version wraps that 
implementation with a record writer and reader which encrypt and decrypt the 
serialized bytes respectively.
    +
    +The fully qualified class 
`org.apache.nifi.provenance.EncryptedWriteAheadProvenanceRepository` is 
specified as the provenance repository implementation in `nifi.properties` as 
the value of `nifi.provenance.repository.implementation`. In addition, 
<<administration-guide.adoc#encrypted-write-ahead-provenance-repository-properties,new
 properties>> must be populated to allow successful initialization.
    +
    +===== StaticKeyProvider
    +The `StaticKeyProvider` implementation defines keys directly in 
`nifi.properties`. Individual keys are provided in hexadecimal encoding (can 
also be encrypted like any other sensitive property in `nifi.properties` using 
the <<administration-guide.adoc#encrypt-config_tool,`./encrypt-config.sh`>> 
tool in the NiFi Toolkit).
    +
    +The following configuration section would result in a key provider with 
two available keys, "Key1" (active) and "AnotherKey".
    +....
    
+nifi.provenance.repository.encryption.key.provider.implementation=org.apache.nifi.provenance.StaticKeyProvider
    +nifi.provenance.repository.encryption.key.id=Key1
    
+nifi.provenance.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
    
+nifi.provenance.repository.encryption.key.id.AnotherKey=0101010101010101010101010101010101010101010101010101010101010101
    +....
    +
    +===== FileBasedKeyProvider
    +The `FileBasedKeyProvider` implementation reads from an encrypted 
definition file of the format:
    +
    +....
    
+key1=NGCpDpxBZNN0DBodz0p1SDbTjC2FG5kp1pCmdUKJlxxtcMSo6GC4fMlTyy1mPeKOxzLut3DRX+51j6PCO5SznA==
    
+key2=GYxPbMMDbnraXs09eGJudAM5jTvVYp05XtImkAg4JY4rIbmHOiVUUI6OeOf7ZW+hH42jtPgNW9pSkkQ9HWY/vQ==
    
+key3=SFe11xuz7J89Y/IQ7YbJPOL0/YKZRFL/VUxJgEHxxlXpd/8ELA7wwN59K1KTr3BURCcFP5YGmwrSKfr4OE4Vlg==
    
+key4=kZprfcTSTH69UuOU3jMkZfrtiVR/eqWmmbdku3bQcUJ/+UToecNB5lzOVEMBChyEXppyXXC35Wa6GEXFK6PMKw==
    
+key5=c6FzfnKm7UR7xqI2NFpZ+fEKBfSU7+1NvRw+XWQ9U39MONWqk5gvoyOCdFR1kUgeg46jrN5dGXk13sRqE0GETQ==
    +....
    +
    +Each line defines a key ID and then the Base64-encoded cipher text of a 16 
byte IV and wrapped AES-128, AES-192, or AES-256 key depending on the JCE 
policies available. The individual keys are wrapped by AES/GCM encryption using 
the **master key** defined by `nifi.bootstrap.sensitive.key` in 
`conf/bootstrap.conf`.
    +
    +===== Key Rotation
    +Simply update `nifi.properties` to reference a new key ID in 
`nifi.provenance.repository.encryption.key.id`. Previously-encrypted events can 
still be decrypted as long as that key is still available in the key definition 
file or `nifi.provenance.repository.encryption.key.id.<OldKeyID>` as the key ID 
is serialized alongside the encrypted record.
    +
    +==== Writing and Reading Event Records
    +Once the repository is initialized, all provenance event record write 
operations are serialized according to the configured schema writer 
(`EventIdFirstSchemaRecordWriter` by default for 
`WriteAheadProvenanceRepository`) to a `byte[]`. Those bytes are then encrypted 
using an implementation of `ProvenanceEventEncryptor` (the only current 
implementation is `AES/GCM/NoPadding`) and the encryption metadata (`keyId`, 
`algorithm`, `version`, `IV`) is serialized and prepended. The complete 
`byte[]` is then written to the repository on disk as normal.
    +
    +image:encrypted-wapr-hex.png["Encrypted provenance repository file on 
disk"]
    +
    +On record read, the process is reversed. The encryption metadata is parsed 
and used to decrypt the serialized bytes, which are then deserialized into a 
`ProvenanceEventRecord` object. The delegation to the normal schema record 
writer/reader allows for "random-access" (i.e. immediate seek without 
decryption of unnecessary records).
    +
    +Within the NiFi UI/API, there is no detectable difference between an 
encrypted and unencrypted provenance repository. The Provenance Query 
operations work as expected with no change to the process.
    +
    +==== Potential Issues
    +* Switching between unencrypted and encrypted repositories
    +** If a user has an existing repository that is not encrypted and switches 
their configuration to use an encrypted repository, the application writes an 
error to the log but starts up. However, previous events are not accessible 
through the provenance query interface and new events will overwrite the 
existing events. The same behavior occurs if a user switches from an encrypted 
repository to an unencrypted repository. Automatic roll-over is a future effort 
(https://issues.apache.org/jira/browse/NIFI-3722[NIFI-3722]) but NiFi is not 
intended to be long-term storage for provenance events so the impact should be 
minimal.
    +*** Logic to handle encrypted -> unencrypted seamlessly as long as the key 
provider available still has the keys used to encrypt the events (see **Key 
Rotation**)
    +*** Logic to handle unencrypted -> encrypted seamlessly as the previously 
recorded events simply need to be read with a plaintext schema record reader 
and then written back with the encrypted record writer
    --- End diff --
    
    Feel like these two sub-bullets need some introductory text to make a 
smooth from the previous section.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to