[jira] [Commented] (KAFKA-46) Commit thread, ReplicaFetcherThread for intra-cluster replication

Jun Rao (JIRA) Tue, 08 May 2012 19:06:14 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271009#comment-13271009
 ]


Jun Rao commented on KAFKA-46:
------------------------------

Some comments on the draft.

High level:
1. We should consider whether to have 1 HW checkpoint file per partition vs 1 
HW checkpoint file for all partitions. The benefit of the latter is fewer file 
writes during checkpoint and fewer file reads during broker startup. Also, to 
avoid corrupting the checkpointed file, we should probably first write the file 
to a tmp file and rename to the actual checkpointed file. This probably can be 
done in a separate jira.

2. The benefit of using an ISRExpirationThread is that it's relatively simple 
since there is 1 thread doing all the ISR expiration. One drawback I can see is 
that idle partitions are still constantly checked by the thread. This may or 
may not be a big concern.

Low level:
3. KafkaApis:
3.1 Agreed with #6 in Prashanth's comment. Probably don't need to call 
maybeAddReplicaToISR directly from handlFetchRequest.
3.2 A subtle issue is that we should probably wait until a (replica) fetch 
request is successful before updating the follower replica's LEO. This is 
because during an unclean failover (no live brokers in ISR), the offset of the 
first fetch request from a follower may not be valid.
3.3 We need to update ISR in ZK and in memory atomically since the ISR can be 
expanded and shrunk from different threads.

4. Partition:
4.1 We probably don't need to add reassignedReplicas in the patch and can add 
it later when we get to kafka-42, if necessary.
4.2 We probably don't need both catchUpReplicas and assignedReplicas since we 
can always derive one from another together with ISR.
4.3 Do we need to maintain a HashMap of <replica_id., Replica>, instead of a 
set of replicas for faster lookup? This may not be a big deal since the replica 
set is small.
4.4 Should we keep highWatermarkUpdateTime in Log where the HW is stored?

5. Replica:
5.1 leo(), if log is present, we should return l.leo not l.getHighwaterMark.

6. KafkaConfig: All follower related properties should be probably be prefixed 
with "follower".

7. Log:
7.1 recoverUptoLastCheckpointedHW(): if there are k+1 log segment files need to 
be truncated, we should delete the last k and truncate the first one.

                
> Commit thread, ReplicaFetcherThread for intra-cluster replication
> -----------------------------------------------------------------
>
>                 Key: KAFKA-46
>                 URL: https://issues.apache.org/jira/browse/KAFKA-46
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jun Rao
>            Assignee: Neha Narkhede
>         Attachments: kafka-46-draft.patch
>
>
> We need to implement the commit thread at the leader and the fetcher thread 
> at the follower for replication the data from the leader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-46) Commit thread, ReplicaFetcherThread for intra-cluster replication

Reply via email to