One other note; you may find additional help on our developers list - [email protected]. This list is more focused on user issues and functionality, while that list gets much deeper into the weeds on coding.
Andy LoPresto [email protected] [email protected] PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Jan 27, 2017, at 11:04 AM, Andy LoPresto <[email protected]> wrote: > > Hi Aakash. > > Last summer I had an intern working for me who investigated using machine > learning (unsupervised anomaly detection using kNN and LOF) against NiFi > provenance data to perform error identification and build a processor > recommendation engine. I can’t share the work as it is company internal, but > there is definitely a growing community and interest in what you’re > discussing. > > If you truly want to distribute the computational load of performing the > analysis to edge nodes, writing custom processors is likely a requirement. > Can I make two suggestions before you begin writing code, though? First, > investigate if you could deploy something like scikit-learn (Python) [1] or > Apache Spark-ML [2] to reside alongside NiFi on the edge nodes (obviously > depends on HW resources). Our early efforts involved writing custom NiFi > code, but it turned out it was much easier to offload the data to > scikit-learn and then ingest the results back into NiFi to continue data > flow, while leaving the computation to an external system. > > If you really want the computation to be running inside the NiFi JVM, also > look at the ExecuteScript processor before trying to write a custom > processor. While NiFi makes it easy to deploy custom code, the SDLC can > provide a few constant delays — after you generate the Maven pom for the NAR, > you will have to write the code in an IDE, test it, compile, build the NAR, > drop it into the NiFi lib, and restart the entire application every time you > make a change. To prototype your model, I recommend using the ES processor, > which will provide immediate feedback. It also abstracts a lot of the > boilerplate framework so you can hyper focus on the domain work. Matt Burgess > has written a number of great articles which should get you up and running > with it [3]. > > Once you have a model and computation you’re confident in, then it’s easy to > translate it to a dedicated custom processor and deploy it. I find this > methodology saves me a lot of time and a bit of frustration. Good luck. I’m > very curious to see what your work yields. > > [1] http://scikit-learn.org/stable/ <http://scikit-learn.org/stable/> > [2] https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/> > [3] https://funnifi.blogspot.com <https://funnifi.blogspot.com/> > > > > Andy LoPresto > [email protected] <mailto:[email protected]> > [email protected] <mailto:[email protected]> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > >> On Jan 27, 2017, at 5:45 AM, Aldrin Piri <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi, Aakash! >> >> To my knowledge, I have not seen any discussion about such processors on the >> lists specifically although have heard people mentioning assorted libraries >> that might be a good fit for the NiFi ecosystem's intended purposes. There >> has been some foundational work such as the following issues which allow >> processors to make use of the state management features in NiFi for the sake >> of managing the flow of data to do some higher level inspection/analysis. >> >> https://issues.apache.org/jira/browse/NIFI-1582 >> <https://issues.apache.org/jira/browse/NIFI-1582> >> https://issues.apache.org/jira/browse/NIFI-1682 >> <https://issues.apache.org/jira/browse/NIFI-1682> >> https://issues.apache.org/jira/browse/NIFI-2590 >> <https://issues.apache.org/jira/browse/NIFI-2590> >> >> If my understanding of your question is correct, I believe your notion of >> distribution may not directly align with the intended focus of NiFi, but >> certainly could be some aspects that work. Would you be willing to expand >> in greater detail how you would envision such processors interacting with >> data and possibly provide some of the libraries you were considering in your >> initial message? >> >> Thanks! >> >> --aldrin >> >> On Fri, Jan 27, 2017 at 7:38 AM, Aakash Khochare >> <[email protected] <mailto:[email protected]>> >> wrote: >> Greetings, >> >> While I understand that the primary use of NiFi/MiNiFi is for secure data >> ingress with the added benefit of Provenance, what are the views of the >> community on writing Processors that implement Machine Learning Algorithms >> and distribute them across Edge+ Cloud using NiFi and MiNiFi? Has anyone >> tried writing such processors? >> >> Regards, >> >> Aakash Khochare >> >> >> >
signature.asc
Description: Message signed with OpenPGP using GPGMail
