A few more responses:

> 101. Compatibility during upgrade: Suppose that the brokers are upgraded to
> the new version, but the broker message format is still the old one. If a
> new producer uses the transaction feature, should the producer get an error
> in this case? A tricky case can be that the leader broker is on the new
> message format, but the follower broker is still on the old message format.
> In this case, the transactional info will be lost in the follower due to
> down conversion. Should we failed the transactional requests when the
> followers are still on the old message format?


We've added some more details to the document about migration. Please take
a look. Two points worth mentioning:

1. Replicas currently take the message format used by the leader. As long
as users do the usual procedure of two rolling bounces, it should be safe
to upgrade the message format.

2. There is no way to support idempotent or transactional features if we
downgrade the message format in the produce request handler. We've modified
the design document to only permit message downgrades if the producer has
disabled idempotence. Otherwise, we will return an
UNSUPPORTED_FOR_MESSAGE_FORMAT error.

110. Transaction log:
> 110.1 "Key => Version AppID Version" It seems that Version should really be
> Type?
> 110.2 "Value => Version Epoch Status ExpirationTime [Topic Partition]"
> Should we store [Topic [Partition]] instead?
> 110.3 To expire an AppId, do we need to insert a tombstone with the expired
> AppID as the key to physically remove the existing AppID entries in the
> transaction log?


Fixed in the document. For 110.3, yes, we need to insert a tombstone after
the AppID has expired. This will work in much the same way as the consumer
coordinator expires offsets using a periodic task.

116. ProducerRequest: The existing format doesn't have "MessageSetSize" at
> the partition level.


This was intentional, but it is easy to overlook. The idea is to modify the
ProduceRequest so that only one message set is included for each partition.
Since the message set contains its own length field, it seemed unnecessary
to have a separate field. The justification for this change was to make the
produce request atomic. With only a single message set for each partition,
either it will be written successfully or not, so an error in the response
will be unambiguous. We are uncertain whether there are legitimate use
cases that require producing smaller message sets in the ProduceRequest, so
we would love to hear feedback on this.

Thanks,
Jason

On Fri, Jan 27, 2017 at 4:21 PM, Apurva Mehta <apu...@confluent.io> wrote:

> Hi again Jun,
>
> I have update the document to address your comments below, but including
> the responses inline to make it easier for everyone to stay on top of the
> conversation.
>
>
>
> > 106. Compacted topics.
> > 106.1. When all messages in a transaction are removed, we could remove
> the
> > commit/abort marker for that transaction too. However, we have to be a
> bit
> > careful. If the marker is removed too quickly, it's possible for a
> consumer
> > to see a message in that transaction, but not to see the marker, and
> > therefore will be stuck in that transaction forever. We have a similar
> > issue when dealing with tombstones. The solution is to preserve the
> > tombstone for at least a preconfigured amount of time after the cleaning
> > has passed the tombstone. Then, as long as a consumer can finish reading
> to
> > the cleaning point within the configured amount of time, it's guaranteed
> > not to miss the tombstone after it has seen a non-tombstone message on
> the
> > same key. I am wondering if we should do something similar here.
> >
>
> This is a good point. As we discussed offline, the solution for the removal
> of control messages will be the same as the solution for problem of
> tombstone removal documented in
> https://issues.apache.org/jira/browse/KAFKA-4545.
>
> 106.2. "To address this problem, we propose to preserve the last epoch and
> > sequence number written by each producer for a fixed amount of time as an
> > empty message set. This is allowed by the new message format we are
> > proposing in this document. The time to preserve the sequence number will
> > be governed by the log retention settings. " Could you be a bit more
> > specific on what retention time will be used since by default, there is
> no
> > retention time for compacted (but not delete) topic?
> >
>
> We discussed this offline, and the consensus that it is reasonable to use
> brokers global log.retention.* settings for these messages.
>
>
> > 106.3 "As for control messages, if the broker does not have any
> > corresponding transaction cached with the PID when encountering a control
> > message, that message can be safely removed."
> > Do controlled messages have keys? If not, do we need to relax the
>
> constraint that messages in a compacted topic must have keys?
> >
>
> The key of a control messages is the control message type. As such, regular
> compaction logic based on key will not apply to control messages. We will
> have to update the log cleaner to ignore messages which have the control
> message bit set.
>
> Control messages can be removed at some point after the last messages of
> the corresponding transaction are removed. As suggested in KAFKA-4545, we
> can use the timestamp associated with the log segment to deduce the safe
> expiration time for control messages in that segment.
>
>
>
> > 112. Control message: Will control messages be used for timestamp
> indexing?
> > If so, what timestamp will we use if the timestamp type is creation time?
> >
> >
> Control messages will not be used for timestamp indexing. Each control
> message will have the log append time for the timestamp, but these messages
> will be ignored when building the timestamp index. Since control messages
> are for system use only and will never be exposed to users, it doesn't make
> sense to include them in the timestamp index.
>
> Further, as you mentioned, when a topic uses creation time, it is
> impossible to ensure that control messages will not skew the time based
> index, since these messages are sent by the transaction coordinator which
> has no notion of the application level message creation time.
>
> Thanks,
> Apurva
>

Reply via email to