Just wanted to add one more point which IMHO just as important. . . Certain “artifacts” (i.e., NARs that depends on libraries which are not ASF friendly) may not fit the ASF licensing requirements of genuine Apache NiFi distribution, yet add a great value for greater community of NiFi users, so having them NOT being part of official NiFi distribution is a value in itself.
Cheers Oleg > On Feb 22, 2017, at 12:52 PM, Oleg Zhurakousky <[email protected]> > wrote: > > Adam > > I 100% agree with your comment on "official/sanctioned”. As an external > artifact registry such as BinTray for example or GitHub, one can not control > what is there, rather how to get it. The final decision is left to the end > user. > Artifacts could be rated and/or Apache NiFi (and/or commercial distributions > of NiFi) can “endorse” and/or “un-endorse” certain artifacts and IMHO that is > perfectly fine. On top of that a future distribution of NiFi can have > configuration to account for the “endorsed/supported” artifacts, yet it > should not stop one from downloading and trying something new. > > Cheers > Oleg > >> On Feb 22, 2017, at 12:43 PM, Adam Lamar <[email protected]> wrote: >> >> Hey all, >> >> I can understand Andre's perspective - when I was building the ListS3 >> processor, I mostly just copied the bits that made sense from ListHDFS and >> ListFile. That worked, but its a poor way to ensure consistency across >> List* processors. >> >> As a once-in-a-while contributor, I love the idea that community >> contributions are respected and we're not dropping them, because they solve >> real needs right now, and it isn't clear another approach would be better. >> >> And I disagree slightly with the notion that an artifact registry will >> solve the problem - I think it could make it worse, at least from a >> consistency point of view. Taming _is_ important, which is one reason >> registry communities have official/sanctioned modules. Quality and >> interoperability can vary vastly. >> >> By convention, it seems like NiFi already has a handful of well-understood >> patterns - List, Fetch, Get, Put, etc all mean something specific in >> processor terms. Is there a reason not to formalize those patterns in the >> code as well? That would help with processor consistency, and if done >> right, it may even be easier to write new processors, fix bugs, etc. >> >> For example, ListS3 initially shipped with some bad session commit() >> behavior, which was obvious once identified, but a generalized >> AbstractListProcessor (higher level that the one that already exists) could >> make it easier to avoid this class of bug. >> >> Admittedly this could be a lot of work. >> >> Cheers, >> Adam >> >> >> >> On Wed, Feb 22, 2017 at 8:38 AM, Oleg Zhurakousky < >> [email protected]> wrote: >> >>> I’ll second Pierre >>> >>> Yes with the current deployment model the amount of processors and the >>> size of NiFi distribution is a concern simply because it’s growing with >>> each release. But it should not be the driver to start jamming more >>> functionality into existing processors which on the surface may look like >>> related (even if they are). >>> Basically a processor should never be complex with regard to it being >>> understood by the end user who is non-technical, so “specialization” is >>> always takes precedence here since it limits “configuration” and thus >>> making such processor simpler. It also helps with maintenance and >>> management of such processor by the developer. Also, having multiple >>> related processors will promote healthy competition where my MyputHDFS may >>> for certain cases be better/faster then YourPutHDFS and why not have both? >>> >>> The “artifact registry” (flow, extension, template etc) is the only answer >>> here since it will remove the “proliferation” and the need for “taming” >>> anything from the picture. With “artifact registry” one or one million >>> processors, the NiFi size/state will always remain constant and small. >>> >>> Cheers >>> Oleg >>>> On Feb 22, 2017, at 6:05 AM, Pierre Villard <[email protected]> >>> wrote: >>>> >>>> Hey guys, >>>> >>>> Thanks for the thread Andre. >>>> >>>> +1 to James' answer. >>>> >>>> I understand the interest that would provide a single processor to >>> connect >>>> to all the back ends... and we could document/improve the PutHDFS to ease >>>> such use but I really don't think that it will benefit the user >>> experience. >>>> That may be interesting in some cases for some users but I don't think >>> that >>>> would be a majority. >>>> >>>> I believe NiFi is great for one reason: you have a lot of specialized >>>> processors that are really easy to use and efficient for what they've >>> been >>>> designed for. >>>> >>>> Let's ask ourselves the question the other way: with the NiFi registry on >>>> its way, what is the problem having multiple processors for each back >>> end? >>>> I don't really see the issue here. OK we have a lot of processors (but I >>>> believe this is a good point for NiFi, for user experience, for >>>> advertising, etc. - maybe we should improve the processor listing though, >>>> but again, this will be part of the NiFi Registry work), it generates a >>>> heavy NiFi binary (but that will be solved with the registry), but that's >>>> all, no? >>>> >>>> Also agree on the positioning aspect: IMO NiFi should not be highly tied >>> to >>>> the Hadoop ecosystem. There is a lot of users using NiFi with absolutely >>> no >>>> relation to Hadoop. Not sure that would send the good "signal". >>>> >>>> Pierre >>>> >>>> >>>> >>>> >>>> 2017-02-22 6:50 GMT+01:00 Andre <[email protected]>: >>>> >>>>> Andrew, >>>>> >>>>> >>>>> On Wed, Feb 22, 2017 at 11:21 AM, Andrew Grande <[email protected]> >>>>> wrote: >>>>> >>>>>> I am observing one assumption in this thread. For some reason we are >>>>>> implying all these will be hadoop compatible file systems. They don't >>>>>> always have an HDFS plugin, nor should they as a mandatory requirement. >>>>>> >>>>> >>>>> You are partially correct. >>>>> >>>>> There is a direct assumption in the availability of a HCFS (thanks >>> Matt!) >>>>> implementation. >>>>> >>>>> This is the case with: >>>>> >>>>> * Windows Azure Blob Storage >>>>> * Google Cloud Storage Connector >>>>> * MapR FileSystem (currently done via NAR recompilation / mvn profile) >>>>> * Alluxio >>>>> * Isilon (via HDFS) >>>>> * others >>>>> >>>>> But I would't say this will apply to every other use storage system and >>> in >>>>> certain cases may not even be necessary (e.g. Isilon scale-out storage >>> may >>>>> be reached using its native HDFS compatible interfaces). >>>>> >>>>> >>>>> Untie completely from the Hadoop nar. This allows for effective minifi >>>>>> interaction without the weight of hadoop libs for example. Massive size >>>>>> savings where it matters. >>>>>> >>>>>> >>>>> Are you suggesting a use case were MiNiFi agents interact directly with >>>>> cloud storage, without relying on NiFi hubs to do that? >>>>> >>>>> >>>>>> For the deployment, it's easy enough for an admin to either rely on a >>>>>> standard tar or rpm if the NAR modules are already available in the >>>>> distro >>>>>> (well, I won't talk registry till it arrives). Mounting a common >>>>> directory >>>>>> on every node or distributing additional jars everywhere, plus configs, >>>>> and >>>>>> then keeping it consistent across is something which can be avoided by >>>>>> simpler packaging. >>>>>> >>>>> >>>>> As long the NAR or RPM supports your use-case, which is not the case of >>>>> people running NiFi with MapR-FS for example. For those, a >>> recompilation is >>>>> required anyway. A flexible processor may remove the need to recompile >>> (I >>>>> am currently playing with the classpath implication to MapR users). >>>>> >>>>> Cheers >>>>> >>> >>> >
