sijie commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-369018351 > Where has this been discussed? I can't find anything on the dev@ list. The discussion has been started since May 2017 and it has been mentioned/discussed at different forms in different email threads, different community meeting or direct meetings. http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAAFz1KPYb4f%2BrUfP2Bhvy%2BigksBLsuXE-QMw6U%2B%3DBDiRXQ19LA%40mail.gmail.com%3E http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201705.mbox/%3CCAAFz1KNr4uB8RMiXK97gkiqfXCVEoygfjeR2rJKcddvF9ZM1yg%40mail.gmail.com%3E > There's a big difference. Making the change in the entrylogger itself, means that the implementation gets pulled into SortedLedgerStorage, and that means he has to deal with the EntryMemTable, which should have exactly nothing to do with the implementation. In order to fully leverage "per ledger" characteristics, parallel flushing is a necessary change. But it doesn't mean the change in EntryMemTable is only dedicated to per ledger entry log manager. It can be useful for single entry log manager. I requested Charan to not enable parallel flushing feature for single entry log manager until we have confident on that change. And that's why there is boolean flags around that piece. This change here is actually taking `SortedLedgerStorage` in a right direction: 1) MemTable does cache and sorting, you can have different flushing policy (sequential vs parallel) 2) EntryLogManager manages entry log files, when to create them, how to rotate them and how to flush/checkpoint them. 3) EntryLogger becomes the entrypoint to append entries, where the complexity of managing files are handed over to entry log manager. This change takes the effort to allow us provide a better modularized implementation of SortedLedgerStorage, which it is the default storage implementation for most of bookkeeper users. > Shipping is a function of many things, not just getting code into branch. I completely agreed with you. and this change is not "getting code into branch" if you have followed the discussion since May last year. > 3000 LOC is never easy to review. 300-400 LOC is the max that can be reviewed competently in an hour, any more, people start skimming. I wrote an email about this in october[1], so I won't go through it again. I agreed with you - smaller patches are much easier to review and get changes in. However this is a very subjective thing that varies on different things and to different people. sometimes reviewers won't see a clear picture if a change is small and they will get confused; reviewers will be overwhelmed if they don't follow closely on the discussion if the change is big. so both cases can exist and this process can not really be quantified. What we have to do is to follow what we called guidelines or practices and encourage people to send changes in small batches. However there can still be exceptions. for example, code changes has been made before (like changes have been made in branches and run on production. for example, the merges come from branches). In this case, it doesn't make any sense for the people cherry-picking the change to redo the changes by breaking down into small patches. I can't speak for all the cases. But for the case here, this feature has been discussed since May last year, and the change has been reviewed/discussed multiple times internally by Salesforce folks and externally by me during this long discussion and review recycle. so I am fine with reviewing the change in its current form.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services