[
https://issues.apache.org/jira/browse/MINIFICPP-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marton Szasz updated MINIFICPP-1201:
------------------------------------
Fix Version/s: (was: 0.8.0)
0.9.0
> Integrates MiNiFi C++ with H2O Driverless AI MOJO Scoring Pipeline (C++
> Runtime Python Wrapper) To Do ML Inference on Edge
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: MINIFICPP-1201
> URL: https://issues.apache.org/jira/browse/MINIFICPP-1201
> Project: Apache NiFi MiNiFi C++
> Issue Type: New Feature
> Affects Versions: master
> Environment: Ubuntu 18.04 in AWS EC2
> MiNiFi C++ 0.7.0
> Reporter: James Medel
> Priority: Blocker
> Fix For: 0.9.0
>
>
> *MiNiFi C++ and H2O Driverless AI Integration* via Custom Python Processors:
> Integrates MiNiFi C++ with H2O Driverless AI by using Driverless AI's MOJO
> Scoring Pipeline (in C++ Runtime Python Wrapper) and MiNiFi's Custom Python
> Processor. Uses a Python Processor to execute the MOJO Scoring Pipeline to do
> batch scoring or real-time scoring for one or more predicted labels on
> tabular test data in the incoming flow file content. If the tabular data is
> one row, then the MOJO does real-time scoring. If the tabular data is
> multiple rows, then the MOJO does batch scoring. I would like to contribute
> my processors to MiNiFi C++ as a new feature.
> *1 custom python processor* created for MiNiFi:
> *H2oMojoPwScoring* - Executes H2O Driverless AI's MOJO Scoring Pipeline in
> C++ Runtime Python Wrapper to do batch scoring or real-time scoring on a
> frame of data within each incoming flow file. Requires the user to add the
> *pipeline.mojo* filepath into the "MOJO Pipeline Filepath" property. This
> property is used in the onTrigger(context, session) function to get the
> pipeline.mojo filepath, so we can *pass it into* the
> *daimojo.model(pipeline_mojo_filepath)* function to instantiate our
> *mojo_scorer*. MOJO creation time and uuid are added as individual flow file
> attributes. Then the *flow file content* is *loaded into Datatable* *frame*
> to hold the test data. Then a Python lambda function called compare is used
> to compare whether the datatable frame header column names equals the
> expected header column names from the mojo scorer. This check is done because
> the datatable frame could have a missing header, which is true when the
> header does not equal the expected header and so we update the datatable
> frame header with the mojo scorer's expected header. Having the correct
> header works nicely because the *mojo scorer's* *predict(datatable_frame)*
> function needs the header and then does the prediction returning a
> predictions datatable frame. The mojo scorer's predict function is *capable
> of doing real-time scoring or batch scoring*, it just depends on the amount
> of rows that the tabular data has. This predictions datatable frame is then
> converted to pandas dataframe, so we can use pandas' to_string(index=False)
> function to convert the dataframe to a string without the dataframe's index.
> Then *the prediction string is written to flow file content*. A flow file
> attribute is added for the number of rows scored. Another one or more flow
> file attributes are added for the predicted label name and its associated
> score. Finally, the flow file is transferred on a success relationship.
>
> *Hydraulic System Condition Monitoring* Data used in MiNiFi Flow:
> The sensor test data I used in this integration comes from Kaggle: Condition
> Monitoring of Hydraulic Systems. I was able to predict hydraulic system
> cooling efficiency through MiNiFi and H2O integration described above. This
> use case here is hydraulic system predictive maintenance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)