Team The below two commits raise serious concerns.
I want to be clear here and point out that h2o is cool. Having such integration is a neat concept and idea and one that certainly warrants determination on how best to do so. My issue is with these two commits as it relates to licensing and maintenance. License: We should support vendors like this wanting to bring their capabilities into NiFi generally sure. But the licensing and mode of use is critical. In talking about this with the contributor for NiFi as well it is clear that at least some or important portions of this require the user to have a 'driverless ai license' so they can include their jar or build pipelines or whatnot. Thus it isn't usable without that first. So it might be the case that this stuff is source dependent only on ASF compatible licenses in terms of source - but it certainly doesn't seem to be true in terms of binary dependencies. Where in the PR or JIRA is there any discussion or review of the licensing? I see plenty from James to believe this needs to be reverted. Any contrary info? https://github.com/apache/nifi/pull/4242#issuecomment-622654527 Maintenance: We should find a way for vendors to offer extension points like this without having to take on the burden of maintenance. How can we possibly do this well? We're learning this lesson the hard way in NiFi itself and this is why the registry is being formulated. JIRA/hygiene: https://issues.apache.org/jira/browse/MINIFICPP-1199 and https://issues.apache.org/jira/browse/MINIFICPP-1201 Both are open. Yet we've merged commits that claim to be against each. I believe both of these commits need to be reverted as I do not believe the licensing considerations have been addressed. I'd like to see discussion on the above maintenance concern as well but that is less pressing. Thanks Joe commit 7206c62240647520cf35649868d5d87903a256c2 Author: James Medel <[email protected]> Date: Wed Apr 29 12:38:04 2020 -0700 MINIFI-1201: Integrate H2O Driverless AI MSP in MiNFi (#766) MINIFI-1201: Integrate H2O Driverless AI MSP in MiNFi (#766) commit 6e5f96518764df7791519c0ee625a94a207ddc69 Author: James Medel <[email protected]> Date: Wed Apr 29 12:37:00 2020 -0700 MINIFI-1199: Integrate H2O Driverless AI PSP in MiNiFi (#763) * MINIFI-1199: Integrate H2O Driverless AI in MiNiFi MiNiFi C++ and H2O Driverless AI Integration via Custom Python Processors: Integrates MiNiFi C++ with H2O's Driverless AI by Using Driverless AI's Python Scoring Pipeline and MiNiFi's Custom Python Processors. Uses the Python Processors to execute the Python Scoring Pipeline scorer to do batch scoring and real-time scoring for one or more predicted labels on test data in the incoming flow file content. I would like to contribute my processors to MiNiFi C++ as a new feature. * Update H2oPspScoreBatches processor This update includes passing "index=False" to "batch_scores_df.to_string(index=False)". By updating this line of code, we tell pandas to not include the DataFrame index when converting the DataFrame to a string. The reason for this update is that we don't want this extra column pandas tries to add to our predictions frame, we only want the predictions. Thus, later when the predictions get saved to a csv file, it will only include the predictions. * Moved ConvertDsToCsv to h2o base dir Since this ConvertDsToCsv python processor is used by the Python Scoring Pipeline and MOJO Scoring Pipeline processors, I moved ConvertDsToCsv to h2o base dir for easier access to it.
