All, Not sure how many other MapR users are effectively using NiFi (I only know two others) but as you may remember from old threads that integrating some different flavours of HDFS compatible APIs can sometimes be puzzling and require recompilation of bundles.
However, recompilation doesn't solve scenarios where for whatever reason, a user may want to use more than one HDFS provider (e.g. MapR + HDP, or Isilon + MapR) and HDFS version are distinct (e.g. While WebHDFS and HttpFs are good palliative solutions to some of this issue, they have their own limitations, the more striking ones being the need to create Kerberos proxy users to run those services [1] and potential bottlenecks [2]. I was wondering if we could tap into the work Pentaho did around using a fork of Apache VFS as an option to solve this issue and also to unify the .*MapR and .*HDFS processors.[*] Pentaho's code is Apache Licensed and is available here: https://github.com/pentaho/pentaho-hdfs-vfs/blob/master/src/org/pentaho/hdfs/vfs/ As you can see, VFS acts as middle man between the application and the API being used to access the "HDFS" backend. I used Pentaho before and know that this functionality happens to work reasonably well. Any thoughts ? [1] required if file ownership does not equal the user running the API endpoint [2] HttpFs [*] Ideally VFS upstream could be offered a PR to address this but not sure how feasible it to achieve this.
