[ https://issues.apache.org/jira/browse/NIFI-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906227#comment-16906227 ]
Joseph Witt commented on NIFI-4775: ----------------------------------- [~devriesb] The current master build (with your update) results in a tar.gz of 1572810473 Aug 13 09:42 nifi-1.10.0-SNAPSHOT-bin.tar.gz Which is 1.46 GB. So ultimately every MB at this point does indeed count. I agree this isn't as drastic of a concern as the other examples I gave and I agree this new implementation could be worth folks considering and should be made available. We would not want folks swapping their core framework nar like extension based nars so i guess this is the best path. But in reality we need, as a whole community, to be putting a lot of effort and momentum into moving to a far faster/smaller build and sourcing extensions/components at runtime as needed. Anyway, this is not the place to debate that aspect so I am fine with you including this but again please realize we're at the max here. * For the JIRA title thanks for fixing. * For the bug fix thanks for creating a specific JIRA for that. * What about ensuring your docs show up in the output? * Additionally, there is now a rocksdbjni 6.2.2 release (as i looked last night I believe) and some of the bug fixes seem worth considering. Any reason not to update that? > Create a FlowFile repo backed by RocksDB > ---------------------------------------- > > Key: NIFI-4775 > URL: https://issues.apache.org/jira/browse/NIFI-4775 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Reporter: Mark Payne > Assignee: Brandon DeVries > Priority: Major > Fix For: 1.10.0 > > Attachments: RocksDBFlowFileRepo.html, rocksdb-flowfile-repo.adoc > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, when a FlowFile is written to the FlowFile Repository, the repo > can either fsync or not, depending on nifi.properties. We should allow a > third option, of fsync only for CREATE events. In this case, if we receive > new data from a source we can fsync the update to the FlowFile Repository > before ACK'ing the data from the source. This allows us to guarantee data > persistence without the overhead of an fsync for every FlowFile Repository > update. > It may make sense, though, to be a bit more selective about when do this. For > example if the source is a system that does not allow us to acknowledge the > receipt of data, such as a ListenUDP processor, this doesn't really buy us > much. In such a case, we could be smart about avoiding the high cost of an > fsync. However, for something like GetSFTP where we have to remove the file > in order to 'acknowledge receipt' we can ensure that we wait for the fsync > before proceeding. > NOTE: This functionality was ultimately provided in a new implementation > backed by RocksDB > -- This message was sent by Atlassian JIRA (v7.6.14#76016)