Hi all - thank you for all the thoughtful feedback.

Regarding my original question, I think the patterns Mike outlined would be 
good enough.
That said, we're not going to move forward using NiFi for the project, and I 
figured I'd take a step back to explain where we were coming from, as some may 
find the perspective useful. Or not :)


We have a project that needs some data transformation. Input is excel, output 
multiple CSVs or POSTs of data to an API. On the surface, simple enough.

Our input Excel can and will change a lot, so we'll need rapid iterations, and 
testing.

The project architecture is container-based, currently consisting of a front 
end docker image, a back end image, and database image. ETL is intended to be a 
fourth. It can be orchestrated with Docker Compose, K8, or bare metal. The goal 
is to be turn key and low friction.

There were two reasons we didn't choose NiFi - the painful (read: long) Java 
deployment lifecycle for custom processing, and system complexity, particularly 
around updating new flows.

Regarding the pain of Java, I've partied with Java since 1.4, so I get it. But 
these days, if I have a data analyst/data engineer with lowish programming 
skills, I can't have them compiling and moving around jars, nor do I want to 
invest in building out the build/deploy pipeline. Platforms have really evolved 
(especially look at the cloud native tools), and code can be written "in line" 
in the UI, and just deployed. A lot of this is due to dynamic languages (e.g. 
Python), but it can still be done with Java with behind the scenes compilation. 
Juypter Notebook, for it's many, many faults, is the way things are heading, 
and the kids love it.

I touched a lot on updating flows above, but in NiFi my choices seemed to be to 
replace the Flow.xml.gz file, or use the NiFi Registry. My concern with the 
registry was that it was yet another moving part, and even still I'd have to 
build in source control workflows. Here again, newer platforms have all this 
baked in.


In closing, I think there is definitely still a place for NiFi, especially on 
the enterprise side where stability, scale and management are paramount. But I 
did want to share this, as these non-enterprise use cases I am describing will, 
over time become the enterprise use cases, and the NiFi project would do well 
to evaluate their long term strategy.

Thanks again for all the responses.
Best,
Kevin

On 2020/04/08 14:27:54, Kevin Telford <[email protected]> wrote: 
> Hi all – I have a two part question.
> 
> 
> 
> I’d like to run NiFi inside a container in order to deploy to various
> environments. As far as I can tell, the flow.xml.gz file is the main
> “source” if you will, for a NiFi data flow.
> 
> Q1) Is the flow.xml.gz file the “source” of a NiFi data flow, and if so, is
> it best practice to copy it to a new env in order to “deploy” a prebuilt
> flow? Or how best is this handled?
> 
> 
> 
> Given that Q1 is true, my challenge then becomes somewhat Docker-specific…
> 
> Situation:
> 
>    - In the Dockerfile we unzip the NiFi source (L62
>    
> <https://github.com/apache/nifi/blob/master/nifi-docker/dockerhub/Dockerfile#L62>)
>    and then create Docker volumes (L75
>    
> <https://github.com/apache/nifi/blob/master/nifi-docker/dockerhub/Dockerfile#L75>
>    specifically for the conf dir). Once the container starts all the normal
>    NiFi startup things happen, and /opt/nifi/nifi-current/conf/flow.xml.gz
>    created.
> 
> Complication:
> 
>    - In order to persist flow.xml.gz outside of the container, I would
>    normally mount the /opt/nifi/nifi-current/conf directory, however in this
>    case I cannot mount it on initialization because that will overwrite conf
>    config files with whatever directory I bind it to (Docker container
>    isolation ensures host -> container file precedence).
>    - I could mount to a running container, but this is less ideal due to
>    the various ways a container can be deployed.
>    - I could copy manually from the running container, but this is less
>    ideal as it’s on demand, and not always persisting latest.
> 
> Resolution:
> 
>    - I believe instead, we would ideally create a few flow config specific
>    env vars and use them to update our nifi.properties (via
>    
> https://github.com/apache/nifi/blob/master/nifi-docker/dockerhub/sh/start.sh),
>    i.e. NIFI_FLOW_CONFIG_FILE_LOCATION, NIFI_FLOW_CONFIG_ARCHIVE_ENABLED,
>    NIFI_FLOW_CONFIG_ARCHIVE_DIR and so on for all nifi.flow.configuration
>    props.
> 
> Q2) Would the above proposal be ideal? (add a few env vars to start.sh) –
> if so, happy to add a PR for the code and doc change. Or have others solved
> this a different way?
> 
> 
> 
> Best,
> 
> Kevin
> 

Reply via email to