Hello, Thanks for your work which seems promising! I have a lot of comments and questions.
- avoid splitting the patch in two when patch 2 cannot build without patch 3, you can as well just submit just one patch since it goes altogether. - have you benchmarked the overhead that is introduced? - It seems that it is the writeback mode that is implemented? It would be useful to say so in the header. - For data block, I have seen in the jbd2 documentation that some escaping is needed when the data block happens to start with the htobe32(JBD2_MAGIC_NUMBER) magic number. https://www.kernel.org/doc/html/latest/filesystems/ext4/journal.html#data-block - there are two versions of the journal superbloc (1 vs 2), don't we need to support that somehow? - It would be useful to reorder functions in the file, to have them in the order they happen: start, dirty, stop, commit, force order - journal_read_block should pass out_buf to store_read for in-place filling, and and only copy if store_read reallocated (which should very rarely happen in practice) - why re-reading the journal superblock in journal_update_superblock? there is no reason why it would change, you can read once at mount, and just write the new version - the yield() call will lead to performance troubles :) The scheduler is really not forced to actually yield, better use a condition variable, that will behave exactly like you need. - diskfs_notify_change: is that the only call place to be? it seems there are many more if (diskfs_synchronous) diskfs_node_update() calls - About the journal_force_checkpoint 5-sec loop, we would want to combine that with the periodic_sync_thread? - More generally, it is not clear that you are respecting the ordering of journal writes vs normal writes. AIUI the principle is that we write to the journal and flush that *before* writing anything normally, so that either we have the old version only, or we also have the journal entry which we can replay, or we have the new version, and no mixture of these. - The hardcoded 50 value in journal_commit_transaction seems a bit ugly :) We'd probably want to put a lower limit, but with a fast system, 50 might be not enough for proper overlapping for performance, we'd probably rather compute a value based on the journal size. - journal_transaction_commit may return E2BIG, we should detect that earlier to split the transaction. - there is an ext2_panic call when the transaction is too huge for the journal, can that actually happen? We should probably check for that early in journal_dirty_block? - It's not clear to me how the journal is getting flushed. The only thing I have seen is that when the journal gets full, we call journal_sync_everything which syncs everything AIUI normally we should be progressively flushing the log, along the normal writes getting done? Samuel
