[ https://issues.apache.org/jira/browse/KAFKA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417913#comment-13417913 ]
Jay Kreps commented on KAFKA-405: --------------------------------- Some of these comments are not directly related to this change but this is just the first time I have looked at this code in detail - I think Log.recoverUptoLastCheckpointedHighWatermark(lastKnownHW: Long) is misnamed. I don't think the log should know anything about hw marks, and it also isn't doing recovery (e.g. checking the validity of the log), I think it is just blindly truncating the log. Can we change it to Log.truncateTo - Is it possible that we need to truncate more than one segment? I.e. couldn't the segment to be truncated not be the last segment (unlikely with 1gb segments, but still a problem) - Can we change the api in MessageSet.truncateUpto should be truncateUpTo or truncateTo - Can you make a wrapper for the RandomAccessFile called HighwaterMarkCheckpoint and put the logic related to that there. Intuitively the logic for serializing a checkpoint shouldn't be mixed into ReplicaManager. Can you also document the file format? Is there a reason this can't be plain text? Also I think a better approach to the updates would be to create a new file, write to it, and then move it over the old one; this will make the update atomic. Not sure if that is needed... - It would be good to have a test that covered the HighwaterMarkCheckpoint file read and write. Just basic reading/writing, nothing fancy. - Do we need to version the hw mark checkpoint file format? I.e. maybe the first line of the file is version=x or something... Not sure if that is needed but I am paranoid about persistent formats after blowing that and being stuck. This would let format changes be handled automatically. - Can we fix the setters/getters in Replica.scala and make changes to the leo update the leoUpdateTime. Currently I think it is encumbant on the caller to do these two things together which is odd... - I think the use of KafkaScheduler is not quite right. This is a thread pool meant to execute many tasks. There should really only be one for all of kafka, not one per background thread. You should probably pass in a central instance as an argument rather than making two new ones. - Also I notice that ReplicaManager.getReplica returns an Option. But then everyone who calls it needs to check the option and return an exception if it is not found. Can we just have getReplica either return the Replica or throw the exception? - I think abbreviations should be in the form updateLeo not updateLEO and updateIsr not updateISR. Let's standardize that. > Improve the high water mark maintenance to store high watermarks for all > partitions in a single file on disk > ------------------------------------------------------------------------------------------------------------ > > Key: KAFKA-405 > URL: https://issues.apache.org/jira/browse/KAFKA-405 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8 > Reporter: Neha Narkhede > Assignee: Neha Narkhede > Attachments: kafka-405-v1.patch > > > KAFKA-46 introduced per partition leader high watermarks. But it stores those > in one file per partition. A more performant solution would be to store all > high watermarks in a single file on disk -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira