Hi Andre,

Your plan seems reasonable to me. The shortest path to verifying it might be to drop in the pentaho-hdfs-vfs artifacts and remove any conflicting VFS providers (or just their config (1)) from the Apache VFS jars.

Some of the recent effort in VFS has been to address bugs which were relevant to Apache Accumulo. From hearing about that, it sounds like VFS may have a little smaller team.

That said, it might be worth asking the Pentaho folks 1) if they could contribute their project to VFS and 2) how they leverage it. They might have some guidance about how to use their project as replacement for the HDFS parts of VFS.

Good luck!

Jim

1. I'd peek at files like this: https://github.com/pentaho/pentaho-hdfs-vfs/blob/master/res/META-INF/vfs-providers.xml.

On 5/29/2016 8:10 AM, Andre wrote:
All,

Not sure how many other MapR users are effectively using NiFi (I only know
two others) but as you may remember from old threads that integrating some
different flavours of HDFS compatible APIs can sometimes be puzzling and
require recompilation of bundles.

However, recompilation doesn't solve scenarios where for whatever reason, a
user may want to use more than one HDFS provider (e.g. MapR + HDP, or
Isilon + MapR) and HDFS version are distinct (e.g.

While WebHDFS and HttpFs are good palliative solutions to some of this
issue, they have their own limitations, the more striking ones being the
need to create Kerberos proxy users to run those services [1] and potential
bottlenecks [2].

I was wondering if we could tap into the work Pentaho did around using a
fork of Apache VFS as an option to solve this issue and also to unify the
.*MapR and .*HDFS processors.[*]

Pentaho's code is Apache Licensed and is available here:

https://github.com/pentaho/pentaho-hdfs-vfs/blob/master/src/org/pentaho/hdfs/vfs/

As you can see, VFS acts as middle man between the application and the API
being used to access the "HDFS" backend. I used Pentaho before and know
that this functionality happens to work reasonably well.


Any thoughts ?





[1] required if file ownership does not equal the user running the API
endpoint
[2] HttpFs
[*] Ideally VFS upstream could be offered a PR to address this but not sure
how feasible it to achieve this.


Reply via email to