One other note; you may find additional help on our developers list - 
[email protected]. This list is more focused on user issues and 
functionality, while that list gets much deeper into the weeds on coding.

Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jan 27, 2017, at 11:04 AM, Andy LoPresto <[email protected]> wrote:
> 
> Hi Aakash.
> 
> Last summer I had an intern working for me who investigated using machine 
> learning (unsupervised anomaly detection using kNN and LOF) against NiFi 
> provenance data to perform error identification and build a processor 
> recommendation engine. I can’t share the work as it is company internal, but 
> there is definitely a growing community and interest in what you’re 
> discussing.
> 
> If you truly want to distribute the computational load of performing the 
> analysis to edge nodes, writing custom processors is likely a requirement. 
> Can I make two suggestions before you begin writing code, though? First, 
> investigate if you could deploy something like scikit-learn (Python) [1] or 
> Apache Spark-ML [2] to reside alongside NiFi on the edge nodes (obviously 
> depends on HW resources). Our early efforts involved writing custom NiFi 
> code, but it turned out it was much easier to offload the data to 
> scikit-learn and then ingest the results back into NiFi to continue data 
> flow, while leaving the computation to an external system.
> 
> If you really want the computation to be running inside the NiFi JVM, also 
> look at the ExecuteScript processor before trying to write a custom 
> processor. While NiFi makes it easy to deploy custom code, the SDLC can 
> provide a few constant delays — after you generate the Maven pom for the NAR, 
> you will have to write the code in an IDE, test it, compile, build the NAR, 
> drop it into the NiFi lib, and restart the entire application every time you 
> make a change. To prototype your model, I recommend using the ES processor, 
> which will provide immediate feedback. It also abstracts a lot of the 
> boilerplate framework so you can hyper focus on the domain work. Matt Burgess 
> has written a number of great articles which should get you up and running 
> with it [3].
> 
> Once you have a model and computation you’re confident in, then it’s easy to 
> translate it to a dedicated custom processor and deploy it. I find this 
> methodology saves me a lot of time and a bit of frustration. Good luck. I’m 
> very curious to see what your work yields.
> 
> [1] http://scikit-learn.org/stable/ <http://scikit-learn.org/stable/>
> [2] https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/>
> [3] https://funnifi.blogspot.com <https://funnifi.blogspot.com/>
> 
> 
> 
> Andy LoPresto
> [email protected] <mailto:[email protected]>
> [email protected] <mailto:[email protected]>
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Jan 27, 2017, at 5:45 AM, Aldrin Piri <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi, Aakash!
>> 
>> To my knowledge, I have not seen any discussion about such processors on the 
>> lists specifically although have heard people mentioning assorted libraries 
>> that might be a good fit for the NiFi ecosystem's intended purposes.  There 
>> has been some foundational work such as the following issues which allow 
>> processors to make use of the state management features in NiFi for the sake 
>> of managing the flow of data to do some higher level inspection/analysis.
>> 
>> https://issues.apache.org/jira/browse/NIFI-1582 
>> <https://issues.apache.org/jira/browse/NIFI-1582>
>> https://issues.apache.org/jira/browse/NIFI-1682 
>> <https://issues.apache.org/jira/browse/NIFI-1682>
>> https://issues.apache.org/jira/browse/NIFI-2590 
>> <https://issues.apache.org/jira/browse/NIFI-2590>
>> 
>> If my understanding of your question is correct, I believe your notion of 
>> distribution may not directly align with the intended focus of NiFi, but 
>> certainly could be some aspects that work.  Would you be willing to expand 
>> in greater detail how you would envision such processors interacting with 
>> data and possibly provide some of the libraries you were considering in your 
>> initial message?
>> 
>> Thanks!
>> 
>> --aldrin
>> 
>> On Fri, Jan 27, 2017 at 7:38 AM, Aakash Khochare 
>> <[email protected] <mailto:[email protected]>> 
>> wrote:
>> Greetings,
>> 
>> While I understand that the primary use of NiFi/MiNiFi is for secure data 
>> ingress with the added benefit of Provenance, what are the views of the 
>> community on writing Processors that implement Machine Learning Algorithms 
>> and distribute them across Edge+ Cloud using NiFi and MiNiFi? Has anyone 
>> tried writing such processors?
>> 
>> Regards,
>> 
>> Aakash Khochare
>> 
>> 
>> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to