[
https://issues.apache.org/jira/browse/KAFKA-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271009#comment-13271009
]
Jun Rao commented on KAFKA-46:
------------------------------
Some comments on the draft.
High level:
1. We should consider whether to have 1 HW checkpoint file per partition vs 1
HW checkpoint file for all partitions. The benefit of the latter is fewer file
writes during checkpoint and fewer file reads during broker startup. Also, to
avoid corrupting the checkpointed file, we should probably first write the file
to a tmp file and rename to the actual checkpointed file. This probably can be
done in a separate jira.
2. The benefit of using an ISRExpirationThread is that it's relatively simple
since there is 1 thread doing all the ISR expiration. One drawback I can see is
that idle partitions are still constantly checked by the thread. This may or
may not be a big concern.
Low level:
3. KafkaApis:
3.1 Agreed with #6 in Prashanth's comment. Probably don't need to call
maybeAddReplicaToISR directly from handlFetchRequest.
3.2 A subtle issue is that we should probably wait until a (replica) fetch
request is successful before updating the follower replica's LEO. This is
because during an unclean failover (no live brokers in ISR), the offset of the
first fetch request from a follower may not be valid.
3.3 We need to update ISR in ZK and in memory atomically since the ISR can be
expanded and shrunk from different threads.
4. Partition:
4.1 We probably don't need to add reassignedReplicas in the patch and can add
it later when we get to kafka-42, if necessary.
4.2 We probably don't need both catchUpReplicas and assignedReplicas since we
can always derive one from another together with ISR.
4.3 Do we need to maintain a HashMap of <replica_id., Replica>, instead of a
set of replicas for faster lookup? This may not be a big deal since the replica
set is small.
4.4 Should we keep highWatermarkUpdateTime in Log where the HW is stored?
5. Replica:
5.1 leo(), if log is present, we should return l.leo not l.getHighwaterMark.
6. KafkaConfig: All follower related properties should be probably be prefixed
with "follower".
7. Log:
7.1 recoverUptoLastCheckpointedHW(): if there are k+1 log segment files need to
be truncated, we should delete the last k and truncate the first one.
> Commit thread, ReplicaFetcherThread for intra-cluster replication
> -----------------------------------------------------------------
>
> Key: KAFKA-46
> URL: https://issues.apache.org/jira/browse/KAFKA-46
> Project: Kafka
> Issue Type: Bug
> Reporter: Jun Rao
> Assignee: Neha Narkhede
> Attachments: kafka-46-draft.patch
>
>
> We need to implement the commit thread at the leader and the fetcher thread
> at the follower for replication the data from the leader.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira