James Medel created MINIFICPP-1201:
--------------------------------------

             Summary: Integrates MiNiFi C++ with H2O Driverless AI MOJO Scoring 
Pipeline (C++ Runtime Python Wrapper) To Do ML Inference on Edge
                 Key: MINIFICPP-1201
                 URL: https://issues.apache.org/jira/browse/MINIFICPP-1201
             Project: Apache NiFi MiNiFi C++
          Issue Type: New Feature
    Affects Versions: master
         Environment: Ubuntu 18.04 in AWS EC2
MiNiFi C++ 0.7.0
            Reporter: James Medel
             Fix For: master


*MiNiFi C++ and H2O Driverless AI Integration* via Custom Python Processors:

Integrates MiNiFi C++ with H2O Driverless AI by using Driverless AI's MOJO 
Scoring Pipeline (in C++ Runtime Python Wrapper) and MiNiFi's Custom Python 
Processor. Uses a Python Processor to execute the MOJO Scoring Pipeline to do 
batch scoring or real-time scoring for one or more predicted labels on tabular 
test data in the incoming flow file content. If the tabular data is one row, 
then the MOJO does real-time scoring. If the tabular data is multiple rows, 
then the MOJO does batch scoring. I would like to contribute my processors to 
MiNiFi C++ as a new feature.

*1 custom python processor* created for MiNiFi:

*H2oMojoPwScoring* - Executes H2O Driverless AI's MOJO Scoring Pipeline in C++ 
Runtime Python Wrapper to do batch scoring or real-time scoring on a frame of 
data within each incoming flow file. Requires the user to add the 
*pipeline.mojo* filepath into the "MOJO Pipeline Filepath" property. This 
property is used in the onTrigger(context, session) function to get the 
pipeline.mojo filepath, so we can *pass it into* the 
*daimojo.model(pipeline_mojo_filepath)* function to instantiate our 
*mojo_scorer*. MOJO creation time and uuid are added as individual flow file 
attributes. Then the *flow file content* is *loaded into Datatable* *frame* to 
hold the test data. Then a Python lambda function called compare is used to 
compare whether the datatable frame header column names equals the expected 
header column names from the mojo scorer. This check is done because the 
datatable frame could have a missing header, which is true when the header does 
not equal the expected header and so we update the datatable frame header with 
the mojo scorer's expected header. Having the correct header works nicely 
because the *mojo scorer's* *predict(datatable_frame)* function needs the 
header and then does the prediction returning a predictions datatable frame. 
The mojo scorer's predict function is *capable of doing real-time scoring or 
batch scoring*, it just depends on the amount of rows that the tabular data 
has. This predictions datatable frame is then converted to pandas dataframe, so 
we can use pandas' to_string(index=False) function to convert the dataframe to 
a string without the dataframe's index. Then *the prediction string is written 
to flow file content*. A flow file attribute is added for the number of rows 
scored. Another one or more flow file attributes are added for the predicted 
label name and its associated score. Finally, the flow file is transferred on a 
success relationship.

 

*Hydraulic System Condition Monitoring* Data used in MiNiFi Flow:

The sensor test data I used in this integration comes from Kaggle: Condition 
Monitoring of Hydraulic Systems. I was able to predict hydraulic system cooling 
efficiency through MiNiFi and H2O integration described above. This use case 
here is hydraulic system predictive maintenance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to