markap14 commented on pull request #5059:
URL: https://github.com/apache/nifi/pull/5059#issuecomment-838659499


   Thanks for the contribution @simonbence! Looking through this i think ideas 
are sound. However, this is updating the `nifi-api` module, and once it's been 
released it's permanent. So we need to be extremely careful about anything that 
we put in there. Naming, Exceptions that should be thrown, and the correct 
level of abstraction. So I have some thoughts around this.
   
   - What this 'extension' is really doing has nothing to do with Auto-Loading. 
It has to do with sourcing the NAR's from some other location. And given the 
interface provided in the PR, things like `start()` and `stop()` don't make a 
lot of sense to me, to be honest. When I see that, I feel like the component 
should be starting some background thread or something maybe so that it can 
poll? But then the background thread is created elsewhere. I think what would 
be a more appropriate level of abstraction would be:
   ```
   public interface NarProvider {
       /**
        * Initializes the NAR Provider based on the given set of properties
        */
       void initialize(NarProviderInitializationContext context);
   
       /**
        * Performs a listing of all NAR's that are available.
        */
       Collection<String> listNars() throws IOException;
   
       /**
        * Fetches the NAR at the given location. The location should be one of 
the values returned by listNars().
        */
       InputStream fetchNarContents(String location) throws IOException;
   }
   ```
   And here, `NarProviderInitializlationContext` would perhaps just have a 
single method:
   ```
   Map<String, String> getProperties();
   ```
   But this abstraction gives us the freedom to update it in the future.
   
   I also think it's important to consider how those properties get provided to 
the implementation. The way that we typically handle something like this within 
NiFi would be to have the framework look at `nifi.properties` and pull out the 
properties appropriate for the instance. So, perhaps we would look for any 
property that has the format:
   ```
   nifi.library.nar.autoload.<name>.implementation
   nifi.library.nar.autoload.<name>.XYZ
   ```
   And then build up a Map<String, String> where the key is the property name 
after <name> and the value is the property value. For example, if we have in 
nifi.properties:
   ```
   
nifi.library.nar.autoload.hdfs.implementation=org.apache.nifi.hdfs.FetchNarHdfs
   nifi.library.nar.autoload.hdfs.kerberos=username
   nifi.library.nar.autoload.hdfs.hello=world
   ```
   This would produce a Map that could be provided to the `initialize` method 
(via the context):
               kerberos => username
               hello => world
   (No need to include `implementation` necessarily because it's used by the 
framework to know the name of the class to use but doesn't really need to be 
passed to the class itself).
   
   A few additional thoughts on the implementation:
   - With this approach, we only need to have extensions for determining what's 
available and downloading it. The Auto-Loader should not be pluggable, and it 
should be responsible for calling out to each NarProvider to get a listing, and 
then triggering the download of each NAR.
   - It makes sense to support multiple Nar Providers, even multiple of the 
same type. It's very reasonable to imagine an organization in which there are 
multiple teams hosting nars in different locations, and a user may want to 
fetch from multiple directories, etc. This would all work nicely with the above 
property naming scheme.
   - We should always include debug-level logging indicating which NAR's were 
returned when performing the listing, and this should indicate how long it took 
to perform that listing.
   - Should log at an INFO level each time a NAR is downloaded and how long 
that took
   - NAR's should be written with a dot-name (`.my-bundle.nar`) and then 
renamed so that the Extension Loader is able to detect it only when it's done. 
It probably would make sense to download into a sub-directory of the 
`extensions` directory. I.e., if the nar loader is configured to monitor the 
`./extensions` directory we could create a directory `./extensions/hdfs` and 
then update the Nar Loader to also scan that directory. This would make it 
easier to understand where each of the NAR's came from if we have multiple 
NarProviders.
   - It's possible that over time users will continually add new versions of a 
NAR to the same directory. And over time this will result in loading massive 
amounts of NAR's into NiFi. Those that get removed will remain in NiFi. We will 
need some way to age off old NARs. Perhaps after performing a listing, we 
should detect any nar's that were previously downloaded from that NarProvider 
and not present in the Listing. Then, if any of those is not in use (i.e., no 
extensions are in the flow that use that bundle) then we can remove the bundle 
and its NarLoader.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to