Re: [DISCUSS] Scale-out/Object Storage - taming the diversity of processors

Andrew Grande Tue, 21 Feb 2017 16:22:24 -0800

I am observing one assumption in this thread. For some reason we are
implying all these will be hadoop compatible file systems. They don't
always have an HDFS plugin, nor should they as a mandatory requirement.
Untie completely from the Hadoop nar. This allows for effective minifi
interaction without the weight of hadoop libs for example. Massive size
savings where it matters.


For the deployment, it's easy enough for an admin to either rely on a
standard tar or rpm if the NAR modules are already available in the distro
(well, I won't talk registry till it arrives). Mounting a common directory
on every node or distributing additional jars everywhere, plus configs, and
then keeping it consistent across is something which can be avoided by
simpler packaging.

Andrew

On Tue, Feb 21, 2017, 6:47 PM Andre <andre-li...@fucs.org> wrote:

> Andrew,
>
> Thank you for contributing.
>
> On 22 Feb 2017 10:21 AM, "Andrew Grande" <apere...@gmail.com> wrote:
>
> Andre,
>
> I came across multiple NiFi use cases where going through the HDFS layer
> and the fs plugin may not be possible. I.e. when no HDFS layer present at
> all, so no NN to connect to.
>
>
> Not sure I understand what you mean.
>
>
> Another important aspect is operations. Current PutHDFS model with
> additional jar location, well, it kinda works, but I very much dislike it.
> Too many possibilities for a human error in addition to deployment pain,
> especially in a cluster.
>
>
> Fair enough. Would you mind expanding a bit on what sort of  challenges
> currently apply in terms of cluster deployment?
>
>
> Finally, native object storage processors have features which may not even
> apply to the HDFS layer. E.g. the Azure storage has Table storage, etc.
>
>
> This is a very valid point but I am sure exceptions (in this case a NoSQL
> DB operating under the umbrella term of "storage").
>
> I perhaps should have made it more explicit but the requirements are:
>
> - existence of a hadoop compatible interface
> - ability to handle files
>
> Again, thank you for the input, truly appreciated.
>
> Andre
>
> I agree consolidating various efforts is worthwhile, but only within a
> context of a specific storage solution. Not 'unifying' them into a single
> layer.
>
> Andrew
>
> On Tue, Feb 21, 2017, 6:10 PM Andre <andre-li...@fucs.org> wrote:
>
> > dev,
> >
> > I was having a chat with Pierre around PR#379 and we thought it would be
> > worth sharing this with the wider group:
> >
> >
> > I recently noticed that we merged a number of PRs and merges around
> > scale-out/cloud based object store into the master.
> >
> > Would it make sense to start considering adopting a pattern where
> > Put/Get/ListHDFS are used in tandem with implementations of the
> > hadoop.filesystem interfaces instead of creating new processors, except
> > where a particular deficiency/incompatibility in the hadoop.filesystem
> > implementation exists?
> >
> > Candidates for removal / non merge would be:
> >
> > - Alluxio (PR#379)
> > - WASB (PR#626)
> >  - Azure* (PR#399)
> > - *GCP (recently merged as PR#1482)
> > - *S3 (although this has been in code so it would have to be deprecated)
> >
> > The pattern would be pretty much the same as the one documented and
> > successfully deployed here:
> >
> > https://community.hortonworks.com/articles/71916/connecting-
> > to-azure-data-lake-from-a-nifi-dataflow.html
> >
> > Which means that in the case of Alluxio, one would use the properties
> > documented here:
> >
> > https://www.alluxio.com/docs/community/1.3/en/Running-
> > Hadoop-MapReduce-on-Alluxio.html
> >
> > While with Google Cloud Storage we would use the properties documented
> > here:
> >
> > https://cloud.google.com/hadoop/google-cloud-storage-connector
> >
> > I noticed that specific processors could have the ability to handle
> > particular properties to a filesystem, however I would like to believe
> the
> > same issue would plague hadoop users, and therefore is reasonable to
> > believe the Hadoop compatible implementations would have ways of exposing
> > those properties as well?
> >
> > In the case the properties are exposed, we perhaps simply adjust the
> *HDFS
> > processors to use dynamic properties to pass those to the underlying
> > module, therefore providing a way to explore particular settings of an
> > underlying storage platforms.
> >
> > Any opinion would be welcome
> >
> > PS-sent it again with proper subject label
> >
>

Re: [DISCUSS] Scale-out/Object Storage - taming the diversity of processors

Reply via email to