Jay Kreps created KAFKA-642:
-------------------------------
Summary: Protocol tweaks for 0.8
Key: KAFKA-642
URL: https://issues.apache.org/jira/browse/KAFKA-642
Project: Kafka
Issue Type: Bug
Reporter: Jay Kreps
There are a couple of things in the protocol that are not idea. It would be
good to tweak these for 0.8 so we start clean.
Here is a set of problems and proposals:
Problems:
1. Correlation id is not used across all the requests. I don't think it can
work as intended because of this.
2. On reflection I am not sure that we need a correlation id field. I think
that since we need to guarantee that processing is sequential on any particular
socket we can correlate with a simple queue. (e.g. as the client sends messages
it adds them to a queue and as it receives responses it just correlates to
whatever is at the head of the queue).
3. The metadata response seems to have a number of problems. Among them is that
it weirdly repeats all the broker information many times. The response includes
the ISR, leader (maybe), and the replicas. Each of these repeat all the broker
information. This is super weird. I think what we should be doing here is
including all broker information for all brokers and then just having the
appropriate ids for the isr, leader, and replicas.
4. For topic discovery I think we need to support the case where no topics are
specified in the metadata request and for this return information about all
topics. I don't think we do this now.
5. I don't understand what the creator id is.
6. The offset request and response is not fully thought through and should be
generalized.
Proposals:
1, 2. Correlation id. This is not strictly speaking needed, but it is maybe
useful for debugging to be able to trace a particular request from client to
server. So we will extend this across all the requests.
3. For metadata response I will try to fix this up by normalizing out the
broker list and having the isr, replicas, and leader field just have the node
id.
4. This should be uncontroversial and easy to add.
5. Let's remove creator id, it isn't used.
6. Let's generalize offset request. My proposal is below:
Rename TopicMetadata API to ClusterMetadata, as this will contain all the data
that is known cluster-wide. Then let's generalize the offset request to be
PartitionMetadata--namely stuff about a particular partition on a particular
server.
The format of PartitionMetdata would be the following:
PartitionMetadataRequest => [TopicName [PartitionId MinSegmentTime
MaxSegmentInfos]]
TopicName => string
PartitionId => uint32
MinSegmentTime => uint64
MaxSegmentInfos => int32
PartitionMetadataResponse => [TopicName [PartitionMetadata]]
TopicName => string
PartitionMetadata => PartitionId LogSize NumberOfSegments LogEndOffset
HighwaterMark [SegmentData]
SegmentData => StartOffset LastModifiedTime
LogSize => uint64
NumberOfSegments => int32
LogEndOffset => int64
HighwaterMark => int64
This would be general enough that we could continue to add to it for any new
pieces of data we need.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira