[
https://issues.apache.org/jira/browse/NIFI-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Guzman (Medel) updated NIFI-13077:
----------------------------------------
Description:
We currently have the concept of *ExternalResourceProvider* with two
implementations (HDFS and NiFi Registry) that can be configured to list and
download all NARs made available in those locations. Those implementations, if
configured, would get started when NiFi starts and would download ALL of the
available NARs, plus a background thread would check every five minutes for new
NARs to be available and downloaded.
The proposal here is to have a similar concept that would focus on extensions /
components but instead of having a background thread and instead of having all
of the components downloaded, the approach would be to plug this into the
*ExtensionBuilder* and when a component cannot be instantiated (when loading a
flow definition) with locally available components, then, instead of creating a
ghost component, the Extension Providers would be queried with specific
coordinates and if the provider makes the component available, then the NAR
would be downloaded (alongside required dependencies if the NAR depends on
another NAR).
This approach already exists in the *Kafka Connect NiFi plugin* with the class
{*}ExtensionClientDefinition{*}. By adopting this approach in NiFi, it’d be
much easier to ship a much *smaller version of NiFi* and have NiFi download the
required components based on flows that are being instantiated / deployed.
The operation of downloading the NAR would not be blocking, meaning that we
would still create a ghost component but after completion of the NAR(s)
download and the loading of the components, the flows would be fully
operational.
It might be possible to show something similar as for the Python extensions
where we show that the component is still in the process of downloading third
party dependencies.
While this is a great opportunity to reduce the size of the NiFi binary (and
associated container image), it would not be great from a user perspective when
designing flows because all of the NARs removed from the default image would no
longer be visible in the list of available components when adding, for example,
a processor to the canvas.
Longer term we could imagine that the extension providers can also implement a
listing API so that when showing the list of available components, we would
show the list of the components available locally as well as the components
available through the extensions providers. The listing of components could add
another column to indicate the source of the component.
This is something that is exposed for the Extension Bundles in the NiFi
Registry (we also have the information about the NiFi API version that has been
used for building the components so we could use this information to only list
components that should be compatible from an API standpoint - same major
version but lower or equal API version).
The immediate goal though would be to introduce the concept of
ExtensionProvider with the following APIs:
{code:java}
boolean isAvailableExtension(Coordinates)
void downloadExtension(Coordinates)
{code}
Longer term we could also consider something like:
{code:java}
List<Extensions> listExtensions(){code}
But we would need to figure out how a NAR can provide the information about the
components that are inside of it. The NiFi Registry provides this information,
but that would not be the case for a Maven based implementation for example.
In nifi.properties we would have something looking like:
{code:java}
nifi.nar.extension.provider.<identifier>.<property-name>{code}
And we would loop through all the configured providers to find the appropriate
NAR to download based on provided coordinates in the flow definition that is
being instantiated (either from flow.json.gz, or an uploaded JSON flow
definition, or when checking out a flow from a registry client).
was:
We currently have the concept of ExternalResourceProvider with two
implementations (HDFS and NiFi Registry) that can be configured to list and
download all NARs made available in those locations. Those implementations, if
configured, would get started when NiFi starts and would download ALL of the
available NARs, plus a background thread would check every five minutes for new
NARs to be available and downloaded.
The proposal here is to have a similar concept that would focus on extensions /
components but instead of having a background thread and instead of having all
of the components downloaded, the approach would be to plug this into the
ExtensionBuilder and when a component cannot be instantiated (when loading a
flow definition) with locally available components, then, instead of creating a
ghost component, the Extension Providers would be queried with specific
coordinates and if the provider makes the component available, then the NAR
would be downloaded (alongside required dependencies if the NAR depends on
another NAR).
This approach already exists in the Kafka Connect NiFi plugin with the class
{*}ExtensionClientDefinition{*}. By adopting this approach in NiFi, it’d be
much easier to ship a much smaller version of NiFi and have NiFi download the
required components based on flows that are being instantiated / deployed.
The operation of downloading the NAR would not be blocking, meaning that we
would still create a ghost component but after completion of the NAR(s)
download and the loading of the components, the flows would be fully
operational.
It might be possible to show something similar as for the Python extensions
where we show that the component is still in the process of downloading third
party dependencies.
While this is a great opportunity to reduce the size of the NiFi binary (and
associated container image), it would not be great from a user perspective when
designing flows because all of the NARs removed from the default image would no
longer be visible in the list of available components when adding, for example,
a processor to the canvas.
Longer term we could imagine that the extension providers can also implement a
listing API so that when showing the list of available components, we would
show the list of the components available locally as well as the components
available through the extensions providers. The listing of components could add
another column to indicate the source of the component.
This is something that is exposed for the Extension Bundles in the NiFi
Registry (we also have the information about the NiFi API version that has been
used for building the components so we could use this information to only list
components that should be compatible from an API standpoint - same major
version but lower or equal API version).
The immediate goal though would be to introduce the concept of
ExtensionProvider with the following APIs:
{code:java}
boolean isAvailableExtension(Coordinates)
void downloadExtension(Coordinates)
{code}
Longer term we could also consider something like:
{code:java}
List<Extensions> listExtensions(){code}
But we would need to figure out how a NAR can provide the information about the
components that are inside of it. The NiFi Registry provides this information,
but that would not be the case for a Maven based implementation for example.
In nifi.properties we would have something looking like:
{code:java}
nifi.nar.extension.provider.<identifier>.<property-name>{code}
And we would loop through all the configured providers to find the appropriate
NAR to download based on provided coordinates in the flow definition that is
being instantiated (either from flow.json.gz, or an uploaded JSON flow
definition, or when checking out a flow from a registry client).
> On-demand Extension Provider
> ----------------------------
>
> Key: NIFI-13077
> URL: https://issues.apache.org/jira/browse/NIFI-13077
> Project: Apache NiFi
> Issue Type: Epic
> Components: Core Framework
> Reporter: Pierre Villard
> Priority: Major
>
> We currently have the concept of *ExternalResourceProvider* with two
> implementations (HDFS and NiFi Registry) that can be configured to list and
> download all NARs made available in those locations. Those implementations,
> if configured, would get started when NiFi starts and would download ALL of
> the available NARs, plus a background thread would check every five minutes
> for new NARs to be available and downloaded.
> The proposal here is to have a similar concept that would focus on extensions
> / components but instead of having a background thread and instead of having
> all of the components downloaded, the approach would be to plug this into the
> *ExtensionBuilder* and when a component cannot be instantiated (when loading
> a flow definition) with locally available components, then, instead of
> creating a ghost component, the Extension Providers would be queried with
> specific coordinates and if the provider makes the component available, then
> the NAR would be downloaded (alongside required dependencies if the NAR
> depends on another NAR).
> This approach already exists in the *Kafka Connect NiFi plugin* with the
> class {*}ExtensionClientDefinition{*}. By adopting this approach in NiFi,
> it’d be much easier to ship a much *smaller version of NiFi* and have NiFi
> download the required components based on flows that are being instantiated /
> deployed.
> The operation of downloading the NAR would not be blocking, meaning that we
> would still create a ghost component but after completion of the NAR(s)
> download and the loading of the components, the flows would be fully
> operational.
> It might be possible to show something similar as for the Python extensions
> where we show that the component is still in the process of downloading third
> party dependencies.
> While this is a great opportunity to reduce the size of the NiFi binary (and
> associated container image), it would not be great from a user perspective
> when designing flows because all of the NARs removed from the default image
> would no longer be visible in the list of available components when adding,
> for example, a processor to the canvas.
> Longer term we could imagine that the extension providers can also implement
> a listing API so that when showing the list of available components, we would
> show the list of the components available locally as well as the components
> available through the extensions providers. The listing of components could
> add another column to indicate the source of the component.
> This is something that is exposed for the Extension Bundles in the NiFi
> Registry (we also have the information about the NiFi API version that has
> been used for building the components so we could use this information to
> only list components that should be compatible from an API standpoint - same
> major version but lower or equal API version).
> The immediate goal though would be to introduce the concept of
> ExtensionProvider with the following APIs:
> {code:java}
> boolean isAvailableExtension(Coordinates)
> void downloadExtension(Coordinates)
> {code}
> Longer term we could also consider something like:
> {code:java}
> List<Extensions> listExtensions(){code}
> But we would need to figure out how a NAR can provide the information about
> the components that are inside of it. The NiFi Registry provides this
> information, but that would not be the case for a Maven based implementation
> for example.
> In nifi.properties we would have something looking like:
> {code:java}
> nifi.nar.extension.provider.<identifier>.<property-name>{code}
> And we would loop through all the configured providers to find the
> appropriate NAR to download based on provided coordinates in the flow
> definition that is being instantiated (either from flow.json.gz, or an
> uploaded JSON flow definition, or when checking out a flow from a registry
> client).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)