James Medel created MINIFICPP-1199:
--------------------------------------

             Summary: Integrates MiNiFi C++ with H2O's Driverless AI To Do ML 
Inference on Edge
                 Key: MINIFICPP-1199
                 URL: https://issues.apache.org/jira/browse/MINIFICPP-1199
             Project: Apache NiFi MiNiFi C++
          Issue Type: New Feature
    Affects Versions: master
         Environment: Ubuntu 18.04 in AWS EC2
MiNiFi C++ 0.7.0
            Reporter: James Medel
             Fix For: master


*MiNiFi C++ and H2O Driverless AI Integration* via Custom Python Processors:

Integrates MiNiFi C++ with H2O's Driverless AI by Using Driverless AI's Python 
Scoring Pipeline and MiNiFi's Custom Python Processors. Uses the Python 
Processors to execute the Python Scoring Pipeline scorer to do batch scoring 
and real-time scoring for one or more predicted labels on test data in the 
incoming flow file content. I would like to contribute my processors to MiNiFi 
C++ as a new feature.

 

*3 custom python processors* created for MiNiFi:

*H2oPspScoreRealTime* - Executes H2O Driverless AI's Python Scoring Pipeline to 
do interactive scoring (real-time) scoring on an individual row or list of test 
data within each incoming flow file. Uses H2O's open-source Datatable library 
to load test data into a frame, then converts it to pandas dataframe. Pandas is 
used to convert the pandas dataframe rows to a list of lists, but since each 
flow file passing through this processor should have only 1 row, we extract the 
1st list. Then that list is passed into the Driverless AI's Python 
scorer.score() function to predict one or more predicted labels. The prediction 
is returned to a list. The number of predicted labels is specified when the 
user built the Python Scoring Pipeline in Driverless AI. With that knowledge, 
there is a property for the user to pass in one or more predicted label names 
that will be used as the predicted header. I create a comma separated string 
using the predicted header and predicted value. The predicted header(s) is on 
one line followed by a newline and the predicted value(s) is on the next line 
followed by a newline. The string is written to the flow file content. Flow 
File attributes are added to the flow file for the number of lists scored and 
the predicted label name and its associated score. Finally, the flow file is 
transferred on a success relationship.

 

*H2oPspScoreBatches* - Executes H2O Driverless AI's Python Scoring Pipeline to 
do batch scoring on a frame of data within each incoming flow file. Uses H2O's 
open-source Datatable library to load test data into a frame. Each frame from 
the flow file passing through this processor should have multiple rows. That 
frame is passed into the Driverless AI's Python scorer.score_batch() function 
to predict one or more predicted labels. The prediction is returned to a pandas 
dataframe, then that dataframe is converted to a string, so it can be written 
to the flow file content. Flow File attributes are added to the flow file for 
the number of rows scored. There are also flow file attributes added for the 
predicted label name and its associated score for the first row in the frame. 
Finally, the flow file is transferred on a success relationship.

 

*ConvertDsToCsv* - Converts data source of incoming flow file to csv. Uses 
H2O's open-source Datatable library to load data into a frame, then converts it 
to pandas dataframe. Pandas is used to convert the pandas dataframe to a csv 
and store it into in-memory text stream StringIO without pandas dataframe 
index. The csv string data is grabbed using file read() function on the 
StringIO object, so it can be written to the flow file content. The flow file 
is transferred on a success relationship.

 

*Hydraulic System Condition Monitoring* Data used in MiNiFi Flow:

The sensor test data I used in this integration comes from [Kaggle: Condition 
Monitoring of Hydraulic 
Systems|[https://www.kaggle.com/jjacostupa/condition-monitoring-of-hydraulic-systems#description.txt]].
 I was able to predict hydraulic system cooling efficiency through MiNiFi and 
H2O integration described above. This use case here is hydraulic system 
predictive maintenance.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to