This is a break off from the discussion on the MiNiFi C++ 0.1.0 Release thread. I assume a hub and spoke NiFi/MiNiFi C++ architecture.
As discussed on that thread, I am concerned about the existing choice for data provenance tracking and the implications it leads to as well as the current data provenance requirements for MiNiFi C++. MiNiFi C++ must be highly efficient and carry a minimal footprint in order to be able to function at background and embedded levels. As such, performance and space are priorities as are the ability to communicate to the NiFi hub the needed information (i.e. there isn't space for a large unindexed data provenance archive locally nor the processing ability to handle it). The data provenance registry must be: 1) Fault tolerant, 2) able to be easily purged, 3) fast to write, 4) easily accessed in session, 5) easily accessed post session. The current choice (LevelDB) meets #3, but not the other 4 requirements. LevelDB is prone to corruption in cases of application failure during a write (fails #1). LevelDB has no indexing, and if keys are by UUID then there is no way to efficiently sort by date or by parent/child (fails #2, #4, #5). The choice for a provenance store should answer as many of these as possible. For permanent stores, the choices would be super lightweight databases or something fault resistent like LMDB. I don't have any preference, just that it functionally addresses as many criteria as possible and absolutely satisfies #1. A solution to #4 and #5 could be that the entire provenance tree inside MiNiFi C++ rides with the flowfile and transfers to NiFi (including through descendants). This I see as something of a requirement as well, as it is the only efficient way to provide cradle to grave provenance through the entire MiNiFi/NiFi system without the need for heavy post processing to reconstruct the tree. While this adds slightly to the package being sent between MiNiFi and NiFi, it's negligible compared to post query this especially where MiNiFi is embedded or on an IoT device. Any thoughts? -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/MiNiFi-C-Data-Provenance-and-Related-Issues-tp14024.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
