I've been using NiFi's Docker image for a while now and thought a few notes
from the things we've done might be useful for your work:
- Using Docker Swarm (NiFi 1.9.2)
- Had to add some property file updates as part of a custom
Dockerfile build because the start.sh didn't cover them (some of these
might have already been addressed):
- nifi.cluster.protocol.is.secure needs to be set to true for
secure clusters
- allow for multiple NODE_IDENTITY entries to be specified in
authorizers.xml via environment variables (e.g. NODE_IDENTITY_1,
NODE_IDENTITY_2, etc.) - add as "Node Identity" and "Initial
USer Identity"
elements
- allow configuration of ldap in authorizers.xml
- uncommenting sections of the file
- replacing element values/attributes with environment variables
- add User Group Providers (we had a composite of LDAP and File
based)
- update nifi.properties to set `nifi.security.identity.mapping`
related properties for LDAP <-> PKI mappings
- update nifi.properties to set appropriate `
nifi.web.http.network.interface`/`nifi.web.https.network.interface`
related entries that were found to be required to enable clustering,
site-to-site and external connections in our Swarm setup
(hosted across
multiple AWS EC2s with two Swarm "networks" in play)
Having been through some of the pain above, we later moved to a Kubernetes
stack and re-implemented some of our approach. Once decision we made was to
inject properties/configuration files instead of using the environment
variable replacements via start.sh (because so many things we wanted
weren't covered and we didn't want to continue trying to update the
provided start.sh via sed/awk commands in our Dockerfile to add more
commands as part of the container startup routine).
- Using Kubernetes (NiFi 1.11.4)
- custom Dockerfile that overrides the start.sh scripts to provide:
- overwrite of "static" config files injected into the k8s
StatefulSet (i.e. everything under conf/ that isn't generated
at startup)
- we set non-dynamic & non-secure values in these files within
our git repo then inject them into the pod
- set dynamic properties, e.g. hostnames (for
`nifi.web.https.host`), similar to the provided start.sh script but a
different set or properties as what we need is different to
what it provides
- create nifi-toolkit properties files (e.g. setting `baseUrl` and
`proxiedEntity`, etc. based on hostname & env vars)
- set secure properties (e.g. encryption.keys) that have provided
as files/env vars by k8s/STS
- add "Node Identity"/"Initial User Identity" entries based on the
k8s/STS setup (i.e. number of nodes in the cluster)
- setup "Initial Admin Identity" (based on env var)
- request node & initial admin certificates from a nifi-toolkit
instance (running in server mode) then configure them in
nifi.properties &
nifi-toolkit properties
- create "common" keystore & truststore files in a known location
with a common password on each cluster node - this is
required so we can
configure S2S reporting tasks with an SSL Controller Service
(that can only
take a single file and password combination so has to be
common across all
nodes)
- use nifi-toolkit to encrypt conf files (after they've been
updated)
- delete unwanted NARs from lib/
- download required extra (apache-nifi) NARs
- we have persisted volumes for
- some logs (that we don't output to STDOUT)
- persisted configuration, e.g. flow.xml.gz, users.xml,
authorisations.xml
- each of the repositories
Retrospectively (things always look wrong when you look back, right? 😊),
some of the stuff we've done with our custom startup scripts would have
probably been better as init-containers (e.g. requesting certificates,
dynamic config changes), but things that might be worth considering from a
NiFi Docker point of view:
- cut-down image in terms of NARs with a way to inject/download extra
NARs as required at startup/as part of a custom build; but that said, the
current base is probably fine and anyone wanting to delete NARs should do
so with their own custom build, as we have
- providing a "base" set of config files but allowing for overrides
using files in a known directory; here I'm thinking mainly of things like
bootstrap.conf, where you could have a conf/conf.d/01-bootstrap.conf file
to provide extra JVM args, similar to Elasticsearch jvm.options.d
<https://www.elastic.co/guide/en/elasticsearch/reference/current/jvm-options.html>
setup
- as you already mentioned, more property/config settings via
environment variables
- ability to change logging config (again could this be done with
additional files in a separate directory maybe?)
*Chris Sampson*
IT Consultant
[email protected]
On Wed, 3 Jun 2020 at 13:57, Shawn Weeks <[email protected]> wrote:
> I’m working on deploying NiFi to Kubernetes and I’ve ran across several
> things that could be improved.
>
>
> 1. Currently flow.xml.gz is stored in ./conf by default which has been
> designated a Docker volume. In Kubernetes volumes are not pre-populated
> from the image so I’m left with some init container magic to copy the
> contents of ./conf to another volume and then back again otherwise ./conf
> is empty. Since we’re configuring everything via environment variables
> anyway setting nifi.flow.configuration.file and designate a volume just for
> flow.xml.gz would solve that. You could even reuse your existing conf
> volume if you haven’t changed anything.
> 2. Expose more variables - NIFI-6232 already exists for this but hasn’t
> had any work.
> 3. Support OpenID Login Provider
> 4. Expose logs besides nifi-app.log
>
>
>