thetumbled commented on PR #4161: URL: https://github.com/apache/bookkeeper/pull/4161#issuecomment-1868853950
> All the changes in this PR are specific to uncommon cases, occurring only when the entrylog file's index is missing. We have encountered cases that the ledger map in entry log is missed, maybe because the bookie crashed before flushing entry log. As long as the corrupted entry exists, bookie will scan the entry log by `extractEntryLogMetadataByScanning` to generate `EntryLogMetadataMap` when doing gc every `gcWaitTime` milliseconds (default 15min). And other cases is related to index rebuilding, which may be rarely used. > The readFromLogChannel function utilizes BufferedLogChannel, which is a RandomAccessFile channel. When reading data from the file channel, the data is pre-fetched into the OS PageCache, with a default pre-fetch size of 4KB. Even though we only retrieve the entryId, the actual data read from the disk is a minimum of 4KB. It is common that the size of one entry reach 4MB ( we set the max batch size of Pulsar Client to be 4MB), so we can decrease 90% of the disk read with this enhancement. > There is a potential risk associated with this PR: > We solely read the entryID without validating the entry data. If the file is corrupted, the scan operation will be unable to detect it and will incorrectly populate the index with the wrong ledger size. This introduces a high level of risk. The fault detecting logic is not changed, in the old logic we just read `entrySize` amount of data and check if we have read such amount of data. In the new read type, `READ_NOTHING` or `READ_LEDGER_ENTRY_ID`, we still move the read position in the loop and check if we have read expected amount of data. I don't think it is a break change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
