Joel, Thanks for the comments. I updated the KIP page and added the canary procedure.
Thanks, Jiangjie (Becket) Qin On Wed, Sep 30, 2015 at 6:26 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > The Phase 2 2.* sub-steps don't seem to be right. Can you look over > that carefully? Also, "definitive" - you mean "absolute" i.e., not > relative offsets right? > > One more thing that may be worth mentioning is that it is technically > possible to canary the new version format on at most one broker (or > multiple if it hosts mutually disjoint partitions). Basically turn on > the new message format on one broker, leave it on for an extended > period - if we hit some unanticipated bug and something goes terribly > wrong with the feature then just kill that broker, switch it to the v0 > on-disk format and reseed it from the leaders. Most people may not > want to have such a long deployment plan but at least it is an option > for those who want to tread very carefully given that it is backwards > incompatible. > > Joel > > On Tue, Sep 29, 2015 at 4:50 PM, Jiangjie Qin <j...@linkedin.com.invalid> > wrote: > > Hi Joel and other folks. > > > > I updated the KIP page with the two phase roll out, which avoids the > > conversion for majority of users. > > > > To do that we need to add a message.format.version configuration to > broker. > > Other than that there is no interface change from the previous proposal. > > Please let me know if you have concern about the updated proposal. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Fri, Sep 25, 2015 at 11:26 AM, Joel Koshy <jjkosh...@gmail.com> > wrote: > > > >> Hey Becket, > >> > >> I do think we need the interim deployment phase, set > >> message.format.version and down-convert for producer request v2. > >> Down-conversion for v2 is no worse than what the broker is doing now. > >> I don't think we want a prolonged phase where we down-convert for > >> every v1 fetch - in fact I'm less concerned about losing zero-copy for > >> those fetch requests than the overhead of decompress/recompress for > >> those fetches as that would increase your CPU usage by 4x, 5x or > >> whatever the average consumer fan-out is. The > >> decompression/recompression will put further memory pressure as well. > >> > >> It is true that clients send the latest request version that it is > >> compiled with and that does not need to change. The broker can > >> continue to send back with zero-copy for fetch request version 2 as > >> well (even if during the interim phase during which it down-converts > >> producer request v2). The consumer iterator (for old consumer) or the > >> Fetcher (for new consumer) needs to be able to handle messages that > >> are in original as well as new (relative offset) format. > >> > >> Thanks, > >> > >> Joel > >> > >> > >> On Thu, Sep 24, 2015 at 7:56 PM, Jiangjie Qin <j...@linkedin.com.invalid > > > >> wrote: > >> > Hi Joel, > >> > > >> > That is a valid concern. And that is actually why we had the > >> > message.format.version before. > >> > > >> > My original thinking was: > >> > 1. upgrade the broker to support both V1 and V2 for consumer/producer > >> > request. > >> > 2. configure broker to store V1 on the disk. (message.format.version > = 1) > >> > 3. upgrade the consumer to support both V1 and V2 for consumer > request. > >> > 4. Meanwhile some producer might also be upgraded to use producer > request > >> > V2. > >> > 5. At this point, for producer request V2, broker will do down > >> conversion. > >> > Regardless consumers are upgraded or not, broker will always use > >> zero-copy > >> > transfer. Because supposedly both old and upgraded consumer should be > >> able > >> > to understand that. > >> > 6. After most of the consumers are upgraded, We set > >> message.format.version > >> > = 1 and only do down conversion for old consumers. > >> > > >> > This way we don't need to reject producer request V2. And we always to > >> > version conversion for the minority of the consumers. However I have a > >> few > >> > concerns over this approach, not sure if they actually matters. > >> > > >> > A. (5) is not true for now. Today the clients only uses the highest > >> > version, i.e. a producer/consumer wouldn't parse a lower version of > >> > response even the code exist there. I think supposedly, consumer > should > >> > stick to one version and broker should do the conversion. > >> > B. Let's say (A) is not a concern, we make all the clients support all > >> the > >> > versions it knows. At step(6), there will be a transitional period > that > >> > user will see both messages with new and old version. For KIP-31 only > it > >> > might be OK because we are not adding anything into the message. But > if > >> the > >> > message has different fields (e.g. KIP-32), that means people will get > >> > those fields from some messages but not from some other messages. > Would > >> > that be a problem? > >> > > >> > If (A) and (B) are not a problem. Is the above procedure able to > address > >> > your concern? > >> > > >> > Thanks, > >> > > >> > Jiangjie (Becket) Qin > >> > > >> > On Thu, Sep 24, 2015 at 6:32 PM, Joel Koshy <jjkosh...@gmail.com> > wrote: > >> > > >> >> The upgrade plan works, but the potentially long interim phase of > >> >> skipping zero-copy for down-conversion could be problematic > especially > >> >> for large deployments with large consumer fan-out. It is not only > >> >> going to be memory overhead but CPU as well - since you need to > >> >> decompress, write absolute offsets, then recompress for every v1 > >> >> fetch. i.e., it may be safer (but obviously more tedious) to have a > >> >> multi-step upgrade process. For e.g.,: > >> >> > >> >> 1 - Upgrade brokers, but disable the feature. i.e., either reject > >> >> producer requests v2 or down-convert to old message format (with > >> >> absolute offsets) > >> >> 2 - Upgrade clients, but they should only use v1 requests > >> >> 3 - Switch (all or most) consumers to use v2 fetch format (which will > >> >> use zero-copy). > >> >> 4 - Turn on the feature on the brokers to allow producer requests v2 > >> >> 5 - Switch producers to use v2 produce format > >> >> > >> >> (You may want a v1 fetch rate metric and decide to proceed to step 4 > >> >> only when that comes down to a trickle) > >> >> > >> >> I'm not sure if the prolonged upgrade process is viable in every > >> >> scenario. I think it should work at LinkedIn for e.g., but may not > for > >> >> other environments. > >> >> > >> >> Joel > >> >> > >> >> > >> >> On Tue, Sep 22, 2015 at 12:55 AM, Jiangjie Qin > >> >> <j...@linkedin.com.invalid> wrote: > >> >> > Thanks for the explanation, Jay. > >> >> > Agreed. We have to keep the offset to be the offset of last inner > >> >> message. > >> >> > > >> >> > Jiangjie (Becket) Qin > >> >> > > >> >> > On Mon, Sep 21, 2015 at 6:21 PM, Jay Kreps <j...@confluent.io> > wrote: > >> >> > > >> >> >> For (3) I don't think we can change the offset in the outer > message > >> from > >> >> >> what it is today as it is relied upon in the search done in the > log > >> >> layer. > >> >> >> The reason it is the offset of the last message rather than the > first > >> >> is to > >> >> >> make the offset a least upper bound (i.e. the smallest offset >= > >> >> >> fetch_offset). This needs to work the same for both gaps due to > >> >> compacted > >> >> >> topics and gaps due to compressed messages. > >> >> >> > >> >> >> So imagine you had a compressed set with offsets {45, 46, 47, 48} > if > >> you > >> >> >> assigned this compressed set the offset 45 a fetch for 46 would > >> actually > >> >> >> skip ahead to 49 (the least upper bound). > >> >> >> > >> >> >> -Jay > >> >> >> > >> >> >> On Mon, Sep 21, 2015 at 5:17 PM, Jun Rao <j...@confluent.io> > wrote: > >> >> >> > >> >> >> > Jiangjie, > >> >> >> > > >> >> >> > Thanks for the writeup. A few comments below. > >> >> >> > > >> >> >> > 1. We will need to be a bit careful with fetch requests from the > >> >> >> followers. > >> >> >> > Basically, as we are doing a rolling upgrade of the brokers, the > >> >> follower > >> >> >> > can't start issuing V2 of the fetch request until the rest of > the > >> >> brokers > >> >> >> > are ready to process it. So, we probably need to make use of > >> >> >> > inter.broker.protocol.version to do the rolling upgrade. In step > >> 1, we > >> >> >> set > >> >> >> > inter.broker.protocol.version to 0.9 and do a round of rolling > >> >> upgrade of > >> >> >> > the brokers. At this point, all brokers are capable of > processing > >> V2 > >> >> of > >> >> >> > fetch requests, but no broker is using it yet. In step 2, we > >> >> >> > set inter.broker.protocol.version to 0.10 and do another round > of > >> >> rolling > >> >> >> > restart of the brokers. In this step, the upgraded brokers will > >> start > >> >> >> > issuing V2 of the fetch request. > >> >> >> > > >> >> >> > 2. If we do #1, I am not sure if there is still a need for > >> >> >> > message.format.version since the broker can start writing > messages > >> in > >> >> the > >> >> >> > new format after inter.broker.protocol.version is set to 0.10. > >> >> >> > > >> >> >> > 3. It wasn't clear from the wiki whether the base offset in the > >> >> shallow > >> >> >> > message is the offset of the first or the last inner message. > It's > >> >> better > >> >> >> > to use the offset of the last inner message. This way, the > >> followers > >> >> >> don't > >> >> >> > have to decompress messages to figure out the next fetch offset. > >> >> >> > > >> >> >> > 4. I am not sure that I understand the following sentence in the > >> >> wiki. It > >> >> >> > seems that the relative offsets in a compressed message don't > have > >> to > >> >> be > >> >> >> > consecutive. If so, why do we need to update the relative > offsets > >> in > >> >> the > >> >> >> > inner messages? > >> >> >> > "When the log cleaner compacts log segments, it needs to update > the > >> >> inner > >> >> >> > message's relative offset values." > >> >> >> > > >> >> >> > Thanks, > >> >> >> > > >> >> >> > Jun > >> >> >> > > >> >> >> > On Thu, Sep 17, 2015 at 12:54 PM, Jiangjie Qin > >> >> <j...@linkedin.com.invalid > >> >> >> > > >> >> >> > wrote: > >> >> >> > > >> >> >> > > Hi folks, > >> >> >> > > > >> >> >> > > Thanks a lot for the feedback on KIP-31 - move to use relative > >> >> offset. > >> >> >> > (Not > >> >> >> > > including timestamp and index discussion). > >> >> >> > > > >> >> >> > > I updated the migration plan section as we discussed on KIP > >> >> hangout. I > >> >> >> > > think it is the only concern raised so far. Please let me > know if > >> >> there > >> >> >> > are > >> >> >> > > further comments about the KIP. > >> >> >> > > > >> >> >> > > Thanks, > >> >> >> > > > >> >> >> > > Jiangjie (Becket) Qin > >> >> >> > > > >> >> >> > > On Mon, Sep 14, 2015 at 5:13 PM, Jiangjie Qin < > j...@linkedin.com > >> > > >> >> >> wrote: > >> >> >> > > > >> >> >> > > > I just updated the KIP-33 to explain the indexing on > CreateTime > >> >> and > >> >> >> > > > LogAppendTime respectively. I also used some use case to > >> compare > >> >> the > >> >> >> > two > >> >> >> > > > solutions. > >> >> >> > > > Although this is for KIP-33, but it does give a some > insights > >> on > >> >> >> > whether > >> >> >> > > > it makes sense to have a per message LogAppendTime. > >> >> >> > > > > >> >> >> > > > > >> >> >> > > > >> >> >> > > >> >> >> > >> >> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index > >> >> >> > > > > >> >> >> > > > As a short summary of the conclusions we have already > reached > >> on > >> >> >> > > timestamp: > >> >> >> > > > 1. It is good to add a timestamp to the message. > >> >> >> > > > 2. LogAppendTime should be used for broker policy > enforcement > >> (Log > >> >> >> > > > retention / rolling) > >> >> >> > > > 3. It is useful to have a CreateTime in message format, > which > >> is > >> >> >> > > immutable > >> >> >> > > > after producer sends the message. > >> >> >> > > > > >> >> >> > > > There are following questions still in discussion: > >> >> >> > > > 1. Should we also add LogAppendTime to message format? > >> >> >> > > > 2. which timestamp should we use to build the index. > >> >> >> > > > > >> >> >> > > > Let's talk about question 1 first because question 2 is > >> actually a > >> >> >> > follow > >> >> >> > > > up question for question 1. > >> >> >> > > > Here are what I think: > >> >> >> > > > 1a. To enforce broker log policy, theoretically we don't > need > >> >> >> > per-message > >> >> >> > > > LogAppendTime. If we don't include LogAppendTime in > message, we > >> >> still > >> >> >> > > need > >> >> >> > > > to implement a separate solution to pass log segment > timestamps > >> >> among > >> >> >> > > > brokers. That means if we don't include the LogAppendTime in > >> >> message, > >> >> >> > > there > >> >> >> > > > will be further complication in replication. > >> >> >> > > > 1b. LogAppendTime has some advantage over CreateTime (KIP-33 > >> has > >> >> >> detail > >> >> >> > > > comparison) > >> >> >> > > > 1c. We have already exposed offset, which is essentially an > >> >> internal > >> >> >> > > > concept of message in terms of position. Exposing > LogAppendTime > >> >> means > >> >> >> > we > >> >> >> > > > expose another internal concept of message in terms of time. > >> >> >> > > > > >> >> >> > > > Considering the above reasons, personally I think it worth > >> adding > >> >> the > >> >> >> > > > LogAppendTime to each message. > >> >> >> > > > > >> >> >> > > > Any thoughts? > >> >> >> > > > > >> >> >> > > > Thanks, > >> >> >> > > > > >> >> >> > > > Jiangjie (Becket) Qin > >> >> >> > > > > >> >> >> > > > On Mon, Sep 14, 2015 at 11:44 AM, Jiangjie Qin < > >> j...@linkedin.com > >> >> > > >> >> >> > > wrote: > >> >> >> > > > > >> >> >> > > >> I was trying to send last email before KIP hangout so maybe > >> did > >> >> not > >> >> >> > > think > >> >> >> > > >> it through completely. By the way, the discussion is > actually > >> >> more > >> >> >> > > related > >> >> >> > > >> to KIP-33, i.e. whether we should index on CreateTime or > >> >> >> > LogAppendTime. > >> >> >> > > >> (Although it seems all the discussion are still in this > >> mailing > >> >> >> > > thread...) > >> >> >> > > >> This solution in last email is for indexing on CreateTime. > It > >> is > >> >> >> > > >> essentially what Jay suggested except we use a timestamp > map > >> >> instead > >> >> >> > of > >> >> >> > > a > >> >> >> > > >> memory mapped index file. Please ignore the proposal of > using > >> a > >> >> log > >> >> >> > > >> compacted topic. The solution can be simplified to: > >> >> >> > > >> > >> >> >> > > >> Each broker keeps > >> >> >> > > >> 1. a timestamp index map - Map[TopicPartitionSegment, > >> >> Map[Timestamp, > >> >> >> > > >> Offset]]. The timestamp is on minute boundary. > >> >> >> > > >> 2. A timestamp index file for each segment. > >> >> >> > > >> When a broker receives a message (both leader or > follower), it > >> >> >> checks > >> >> >> > if > >> >> >> > > >> the timestamp index map contains the timestamp for current > >> >> segment. > >> >> >> > The > >> >> >> > > >> broker add the offset to the map and append an entry to the > >> >> >> timestamp > >> >> >> > > index > >> >> >> > > >> if the timestamp does not exist. i.e. we only use the index > >> file > >> >> as > >> >> >> a > >> >> >> > > >> persistent copy of the index timestamp map. > >> >> >> > > >> > >> >> >> > > >> When a log segment is deleted, we need to: > >> >> >> > > >> 1. delete the TopicPartitionKeySegment key in the timestamp > >> index > >> >> >> map. > >> >> >> > > >> 2. delete the timestamp index file > >> >> >> > > >> > >> >> >> > > >> This solution assumes we only keep CreateTime in the > message. > >> >> There > >> >> >> > are > >> >> >> > > a > >> >> >> > > >> few trade-offs in this solution: > >> >> >> > > >> 1. The granularity of search will be per minute. > >> >> >> > > >> 2. All the timestamp index map has to be in the memory all > the > >> >> time. > >> >> >> > > >> 3. We need to think about another way to honor log > retention > >> time > >> >> >> and > >> >> >> > > >> time-based log rolling. > >> >> >> > > >> 4. We lose the benefit brought by including LogAppendTime > in > >> the > >> >> >> > message > >> >> >> > > >> mentioned earlier. > >> >> >> > > >> > >> >> >> > > >> I am not sure whether this solution is necessarily better > than > >> >> >> > indexing > >> >> >> > > >> on LogAppendTime. > >> >> >> > > >> > >> >> >> > > >> I will update KIP-33 to explain the solution to index on > >> >> CreateTime > >> >> >> > and > >> >> >> > > >> LogAppendTime respectively and put some more concrete use > >> cases > >> >> as > >> >> >> > well. > >> >> >> > > >> > >> >> >> > > >> Thanks, > >> >> >> > > >> > >> >> >> > > >> Jiangjie (Becket) Qin > >> >> >> > > >> > >> >> >> > > >> > >> >> >> > > >> On Mon, Sep 14, 2015 at 9:40 AM, Jiangjie Qin < > >> j...@linkedin.com > >> >> > > >> >> >> > > wrote: > >> >> >> > > >> > >> >> >> > > >>> Hi Joel, > >> >> >> > > >>> > >> >> >> > > >>> Good point about rebuilding index. I agree that having a > per > >> >> >> message > >> >> >> > > >>> LogAppendTime might be necessary. About time adjustment, > the > >> >> >> solution > >> >> >> > > >>> sounds promising, but it might be better to make it as a > >> follow > >> >> up > >> >> >> of > >> >> >> > > the > >> >> >> > > >>> KIP because it seems a really rare use case. > >> >> >> > > >>> > >> >> >> > > >>> I have another thought on how to manage the out of order > >> >> >> timestamps. > >> >> >> > > >>> Maybe we can do the following: > >> >> >> > > >>> Create a special log compacted topic __timestamp_index > >> similar > >> >> to > >> >> >> > > topic, > >> >> >> > > >>> the key would be (TopicPartition, > >> TimeStamp_Rounded_To_Minute), > >> >> the > >> >> >> > > value > >> >> >> > > >>> is offset. In memory, we keep a map for each > TopicPartition, > >> the > >> >> >> > value > >> >> >> > > is > >> >> >> > > >>> (timestamp_rounded_to_minute -> > >> smallest_offset_in_the_minute). > >> >> >> This > >> >> >> > > way we > >> >> >> > > >>> can search out of order message and make sure no message > is > >> >> >> missing. > >> >> >> > > >>> > >> >> >> > > >>> Thoughts? > >> >> >> > > >>> > >> >> >> > > >>> Thanks, > >> >> >> > > >>> > >> >> >> > > >>> Jiangjie (Becket) Qin > >> >> >> > > >>> > >> >> >> > > >>> On Fri, Sep 11, 2015 at 12:46 PM, Joel Koshy < > >> >> jjkosh...@gmail.com> > >> >> >> > > >>> wrote: > >> >> >> > > >>> > >> >> >> > > >>>> Jay had mentioned the scenario of mirror-maker bootstrap > >> which > >> >> >> would > >> >> >> > > >>>> effectively reset the logAppendTimestamps for the > >> bootstrapped > >> >> >> data. > >> >> >> > > >>>> If we don't include logAppendTimestamps in each message > >> there > >> >> is a > >> >> >> > > >>>> similar scenario when rebuilding indexes during recovery. > >> So it > >> >> >> > seems > >> >> >> > > >>>> it may be worth adding that timestamp to messages. The > >> >> drawback to > >> >> >> > > >>>> that is exposing a server-side concept in the protocol > >> >> (although > >> >> >> we > >> >> >> > > >>>> already do that with offsets). logAppendTimestamp really > >> >> should be > >> >> >> > > >>>> decided by the broker so I think the first scenario may > have > >> >> to be > >> >> >> > > >>>> written off as a gotcha, but the second may be worth > >> addressing > >> >> >> (by > >> >> >> > > >>>> adding it to the message format). > >> >> >> > > >>>> > >> >> >> > > >>>> The other point that Jay raised which needs to be > addressed > >> >> (since > >> >> >> > we > >> >> >> > > >>>> require monotically increasing timestamps in the index) > in > >> the > >> >> >> > > >>>> proposal is changing time on the server (I'm a little > less > >> >> >> concerned > >> >> >> > > >>>> about NTP clock skews than a user explicitly changing the > >> >> server's > >> >> >> > > >>>> time - i.e., big clock skews). We would at least want to > >> "set > >> >> >> back" > >> >> >> > > >>>> all the existing timestamps to guarantee non-decreasing > >> >> timestamps > >> >> >> > > >>>> with future messages. I'm not sure at this point how > best to > >> >> >> handle > >> >> >> > > >>>> that, but we could perhaps have a epoch/base-time (or > >> >> >> > time-correction) > >> >> >> > > >>>> stored in the log directories and base all log index > >> timestamps > >> >> >> off > >> >> >> > > >>>> that base-time (or corrected). So if at any time you > >> determine > >> >> >> that > >> >> >> > > >>>> time has changed backwards you can adjust that base-time > >> >> without > >> >> >> > > >>>> having to fix up all the entries. Without knowing the > exact > >> >> diff > >> >> >> > > >>>> between the previous clock and new clock we cannot adjust > >> the > >> >> >> times > >> >> >> > > >>>> exactly, but we can at least ensure increasing > timestamps. > >> >> >> > > >>>> > >> >> >> > > >>>> On Fri, Sep 11, 2015 at 10:52 AM, Jiangjie Qin > >> >> >> > > >>>> <j...@linkedin.com.invalid> wrote: > >> >> >> > > >>>> > Ewen and Jay, > >> >> >> > > >>>> > > >> >> >> > > >>>> > They way I see the LogAppendTime is another format of > >> >> "offset". > >> >> >> It > >> >> >> > > >>>> serves > >> >> >> > > >>>> > the following purpose: > >> >> >> > > >>>> > 1. Locate messages not only by position, but also by > time. > >> >> The > >> >> >> > > >>>> difference > >> >> >> > > >>>> > from offset is timestamp is not unique for all messags. > >> >> >> > > >>>> > 2. Allow broker to manage messages based on time, e.g. > >> >> >> retention, > >> >> >> > > >>>> rolling > >> >> >> > > >>>> > 3. Provide convenience for user to search message not > >> only by > >> >> >> > > offset, > >> >> >> > > >>>> but > >> >> >> > > >>>> > also by timestamp. > >> >> >> > > >>>> > > >> >> >> > > >>>> > For purpose (2) we don't need per message server > >> timestamp. > >> >> We > >> >> >> > only > >> >> >> > > >>>> need > >> >> >> > > >>>> > per log segment server timestamp and propagate it among > >> >> brokers. > >> >> >> > > >>>> > > >> >> >> > > >>>> > For (1) and (3), we need per message timestamp. Then > the > >> >> >> question > >> >> >> > is > >> >> >> > > >>>> > whether we should use CreateTime or LogAppendTime? > >> >> >> > > >>>> > > >> >> >> > > >>>> > I completely agree that an application timestamp is > very > >> >> useful > >> >> >> > for > >> >> >> > > >>>> many > >> >> >> > > >>>> > use cases. But it seems to me that having Kafka to > >> understand > >> >> >> and > >> >> >> > > >>>> maintain > >> >> >> > > >>>> > application timestamp is a bit over demanding. So I > think > >> >> there > >> >> >> is > >> >> >> > > >>>> value to > >> >> >> > > >>>> > pass on CreateTime for application convenience, but I > am > >> not > >> >> >> sure > >> >> >> > it > >> >> >> > > >>>> can > >> >> >> > > >>>> > replace LogAppendTime. Managing out-of-order > CreateTime is > >> >> >> > > equivalent > >> >> >> > > >>>> to > >> >> >> > > >>>> > allowing producer to send their own offset and ask > broker > >> to > >> >> >> > manage > >> >> >> > > >>>> the > >> >> >> > > >>>> > offset for them, It is going to be very hard to > maintain > >> and > >> >> >> could > >> >> >> > > >>>> create > >> >> >> > > >>>> > huge performance/functional issue because of > complicated > >> >> logic. > >> >> >> > > >>>> > > >> >> >> > > >>>> > About whether we should expose LogAppendTime to > broker, I > >> >> agree > >> >> >> > that > >> >> >> > > >>>> server > >> >> >> > > >>>> > timestamp is internal to broker, but isn't offset also > an > >> >> >> internal > >> >> >> > > >>>> concept? > >> >> >> > > >>>> > Arguably it's not provided by producer so consumer > >> >> application > >> >> >> > logic > >> >> >> > > >>>> does > >> >> >> > > >>>> > not have to know offset. But user needs to know offset > >> >> because > >> >> >> > they > >> >> >> > > >>>> need to > >> >> >> > > >>>> > know "where is the message" in the log. LogAppendTime > >> >> provides > >> >> >> the > >> >> >> > > >>>> answer > >> >> >> > > >>>> > of "When was the message appended" to the log. So > >> personally > >> >> I > >> >> >> > think > >> >> >> > > >>>> it is > >> >> >> > > >>>> > reasonable to expose the LogAppendTime to consumers. > >> >> >> > > >>>> > > >> >> >> > > >>>> > I can see some use cases of exposing the > LogAppendTime, to > >> >> name > >> >> >> > > some: > >> >> >> > > >>>> > 1. Let's say broker has 7 days of log retention, some > >> >> >> application > >> >> >> > > >>>> wants to > >> >> >> > > >>>> > reprocess the data in past 3 days. User can simply > provide > >> >> the > >> >> >> > > >>>> timestamp > >> >> >> > > >>>> > and start consume. > >> >> >> > > >>>> > 2. User can easily know lag by time. > >> >> >> > > >>>> > 3. Cross cluster fail over. This is a more complicated > use > >> >> case, > >> >> >> > > >>>> there are > >> >> >> > > >>>> > two goals: 1) Not lose message; and 2) do not reconsume > >> tons > >> >> of > >> >> >> > > >>>> messages. > >> >> >> > > >>>> > Only knowing offset of cluster A won't help with > finding > >> fail > >> >> >> over > >> >> >> > > >>>> point in > >> >> >> > > >>>> > cluster B because an offset of a cluster means > nothing to > >> >> >> another > >> >> >> > > >>>> cluster. > >> >> >> > > >>>> > Timestamp however is a good cross cluster reference in > >> this > >> >> >> case. > >> >> >> > > >>>> > > >> >> >> > > >>>> > Thanks, > >> >> >> > > >>>> > > >> >> >> > > >>>> > Jiangjie (Becket) Qin > >> >> >> > > >>>> > > >> >> >> > > >>>> > On Thu, Sep 10, 2015 at 9:28 PM, Ewen Cheslack-Postava > < > >> >> >> > > >>>> e...@confluent.io> > >> >> >> > > >>>> > wrote: > >> >> >> > > >>>> > > >> >> >> > > >>>> >> Re: MM preserving timestamps: Yes, this was how I > >> >> interpreted > >> >> >> the > >> >> >> > > >>>> point in > >> >> >> > > >>>> >> the KIP and I only raised the issue because it > restricts > >> the > >> >> >> > > >>>> usefulness of > >> >> >> > > >>>> >> timestamps anytime MM is involved. I agree it's not a > >> deal > >> >> >> > breaker, > >> >> >> > > >>>> but I > >> >> >> > > >>>> >> wanted to understand exact impact of the change. Some > >> users > >> >> >> seem > >> >> >> > to > >> >> >> > > >>>> want to > >> >> >> > > >>>> >> be able to seek by application-defined timestamps > >> (despite > >> >> the > >> >> >> > many > >> >> >> > > >>>> obvious > >> >> >> > > >>>> >> issues involved), and the proposal clearly would not > >> support > >> >> >> that > >> >> >> > > >>>> unless > >> >> >> > > >>>> >> the timestamps submitted with the produce requests > were > >> >> >> > respected. > >> >> >> > > >>>> If we > >> >> >> > > >>>> >> ignore client submitted timestamps, then we probably > >> want to > >> >> >> try > >> >> >> > to > >> >> >> > > >>>> hide > >> >> >> > > >>>> >> the timestamps as much as possible in any public > >> interface > >> >> >> (e.g. > >> >> >> > > >>>> never > >> >> >> > > >>>> >> shows up in any public consumer APIs), but expose it > just > >> >> >> enough > >> >> >> > to > >> >> >> > > >>>> be > >> >> >> > > >>>> >> useful for operational purposes. > >> >> >> > > >>>> >> > >> >> >> > > >>>> >> Sorry if my devil's advocate position / attempt to map > >> the > >> >> >> design > >> >> >> > > >>>> space led > >> >> >> > > >>>> >> to some confusion! > >> >> >> > > >>>> >> > >> >> >> > > >>>> >> -Ewen > >> >> >> > > >>>> >> > >> >> >> > > >>>> >> > >> >> >> > > >>>> >> On Thu, Sep 10, 2015 at 5:48 PM, Jay Kreps < > >> >> j...@confluent.io> > >> >> >> > > wrote: > >> >> >> > > >>>> >> > >> >> >> > > >>>> >> > Ah, I see, I think I misunderstood about MM, it was > >> called > >> >> >> out > >> >> >> > in > >> >> >> > > >>>> the > >> >> >> > > >>>> >> > proposal and I thought you were saying you'd retain > the > >> >> >> > timestamp > >> >> >> > > >>>> but I > >> >> >> > > >>>> >> > think you're calling out that you're not. In that > case > >> >> you do > >> >> >> > > have > >> >> >> > > >>>> the > >> >> >> > > >>>> >> > opposite problem, right? When you add mirroring for > a > >> >> topic > >> >> >> all > >> >> >> > > >>>> that data > >> >> >> > > >>>> >> > will have a timestamp of now and retention won't be > >> right. > >> >> >> Not > >> >> >> > a > >> >> >> > > >>>> blocker > >> >> >> > > >>>> >> > but a bit of a gotcha. > >> >> >> > > >>>> >> > > >> >> >> > > >>>> >> > -Jay > >> >> >> > > >>>> >> > > >> >> >> > > >>>> >> > > >> >> >> > > >>>> >> > > >> >> >> > > >>>> >> > On Thu, Sep 10, 2015 at 5:40 PM, Joel Koshy < > >> >> >> > jjkosh...@gmail.com > >> >> >> > > > > >> >> >> > > >>>> wrote: > >> >> >> > > >>>> >> > > >> >> >> > > >>>> >> > > > Don't you see all the same issues you see with > >> >> >> > client-defined > >> >> >> > > >>>> >> > timestamp's > >> >> >> > > >>>> >> > > > if you let mm control the timestamp as you were > >> >> >> proposing? > >> >> >> > > >>>> That means > >> >> >> > > >>>> >> > > time > >> >> >> > > >>>> >> > > > >> >> >> > > >>>> >> > > Actually I don't think that was in the proposal > (or > >> was > >> >> >> it?). > >> >> >> > > >>>> i.e., I > >> >> >> > > >>>> >> > > think it was always supposed to be controlled by > the > >> >> broker > >> >> >> > > (and > >> >> >> > > >>>> not > >> >> >> > > >>>> >> > > MM). > >> >> >> > > >>>> >> > > > >> >> >> > > >>>> >> > > > Also, Joel, can you just confirm that you guys > have > >> >> >> talked > >> >> >> > > >>>> through > >> >> >> > > >>>> >> the > >> >> >> > > >>>> >> > > > whole timestamp thing with the Samza folks at > LI? > >> The > >> >> >> > reason > >> >> >> > > I > >> >> >> > > >>>> ask > >> >> >> > > >>>> >> > about > >> >> >> > > >>>> >> > > > this is that Samza and Kafka Streams (KIP-28) > are > >> both > >> >> >> > trying > >> >> >> > > >>>> to rely > >> >> >> > > >>>> >> > on > >> >> >> > > >>>> >> > > > >> >> >> > > >>>> >> > > We have not. This is a good point - we will > >> follow-up. > >> >> >> > > >>>> >> > > > >> >> >> > > >>>> >> > > > WRT your idea of a FollowerFetchRequestI had > >> thought > >> >> of a > >> >> >> > > >>>> similar > >> >> >> > > >>>> >> idea > >> >> >> > > >>>> >> > > > where we use the leader's timestamps to > >> approximately > >> >> set > >> >> >> > the > >> >> >> > > >>>> >> > follower's > >> >> >> > > >>>> >> > > > timestamps. I had thought of just adding a > >> partition > >> >> >> > metadata > >> >> >> > > >>>> request > >> >> >> > > >>>> >> > > that > >> >> >> > > >>>> >> > > > would subsume the current offset/time lookup and > >> >> could be > >> >> >> > > used > >> >> >> > > >>>> by the > >> >> >> > > >>>> >> > > > follower to try to approximately keep their > >> timestamps > >> >> >> > > kosher. > >> >> >> > > >>>> It's a > >> >> >> > > >>>> >> > > > little hacky and doesn't help with MM but it is > >> also > >> >> >> maybe > >> >> >> > > less > >> >> >> > > >>>> >> > invasive > >> >> >> > > >>>> >> > > so > >> >> >> > > >>>> >> > > > that approach could be viable. > >> >> >> > > >>>> >> > > > >> >> >> > > >>>> >> > > That would also work, but perhaps responding with > the > >> >> >> actual > >> >> >> > > >>>> leader > >> >> >> > > >>>> >> > > offset-timestamp entries (corresponding to the > >> fetched > >> >> >> > portion) > >> >> >> > > >>>> would > >> >> >> > > >>>> >> > > be exact and it should be small as well. Anyway, > the > >> >> main > >> >> >> > > >>>> motivation > >> >> >> > > >>>> >> > > in this was to avoid leaking server-side > timestamps > >> to > >> >> the > >> >> >> > > >>>> >> > > message-format if people think it is worth it so > the > >> >> >> > > >>>> alternatives are > >> >> >> > > >>>> >> > > implementation details. My original instinct was > >> that it > >> >> >> also > >> >> >> > > >>>> avoids a > >> >> >> > > >>>> >> > > backwards incompatible change (but it does not > >> because > >> >> we > >> >> >> > also > >> >> >> > > >>>> have > >> >> >> > > >>>> >> > > the relative offset change). > >> >> >> > > >>>> >> > > > >> >> >> > > >>>> >> > > Thanks, > >> >> >> > > >>>> >> > > > >> >> >> > > >>>> >> > > Joel > >> >> >> > > >>>> >> > > > >> >> >> > > >>>> >> > > > > >> >> >> > > >>>> >> > > > > >> >> >> > > >>>> >> > > > > >> >> >> > > >>>> >> > > > On Thu, Sep 10, 2015 at 3:36 PM, Joel Koshy < > >> >> >> > > >>>> jjkosh...@gmail.com> > >> >> >> > > >>>> >> > wrote: > >> >> >> > > >>>> >> > > > > >> >> >> > > >>>> >> > > >> I just wanted to comment on a few points made > >> >> earlier in > >> >> >> > > this > >> >> >> > > >>>> >> thread: > >> >> >> > > >>>> >> > > >> > >> >> >> > > >>>> >> > > >> Concerns on clock skew: at least for the > original > >> >> >> > proposal's > >> >> >> > > >>>> scope > >> >> >> > > >>>> >> > > >> (which was more for honoring retention > >> broker-side) > >> >> this > >> >> >> > > >>>> would only > >> >> >> > > >>>> >> be > >> >> >> > > >>>> >> > > >> an issue when spanning leader movements right? > >> i.e., > >> >> >> > leader > >> >> >> > > >>>> >> migration > >> >> >> > > >>>> >> > > >> latency has to be much less than clock skew for > >> this > >> >> to > >> >> >> > be a > >> >> >> > > >>>> real > >> >> >> > > >>>> >> > > >> issue wouldn’t it? > >> >> >> > > >>>> >> > > >> > >> >> >> > > >>>> >> > > >> Client timestamp vs broker timestamp: I’m not > sure > >> >> Kafka > >> >> >> > > >>>> (brokers) > >> >> >> > > >>>> >> are > >> >> >> > > >>>> >> > > >> the right place to reason about client-side > >> >> timestamps > >> >> >> > > >>>> precisely due > >> >> >> > > >>>> >> > > >> to the nuances that have been discussed at > length > >> in > >> >> >> this > >> >> >> > > >>>> thread. My > >> >> >> > > >>>> >> > > >> preference would have been to the timestamp > (now > >> >> called > >> >> >> > > >>>> >> > > >> LogAppendTimestamp) have nothing to do with the > >> >> >> > > applications. > >> >> >> > > >>>> Ewen > >> >> >> > > >>>> >> > > >> raised a valid concern about leaking such > >> >> >> > > >>>> “private/server-side” > >> >> >> > > >>>> >> > > >> timestamps into the protocol spec. i.e., it is > >> fine > >> >> to > >> >> >> > have > >> >> >> > > >>>> the > >> >> >> > > >>>> >> > > >> CreateTime which is expressly client-provided > and > >> >> >> > immutable > >> >> >> > > >>>> >> > > >> thereafter, but the LogAppendTime is also going > >> part > >> >> of > >> >> >> > the > >> >> >> > > >>>> protocol > >> >> >> > > >>>> >> > > >> and it would be good to avoid exposure (to > client > >> >> >> > > developers) > >> >> >> > > >>>> if > >> >> >> > > >>>> >> > > >> possible. Ok, so here is a slightly different > >> >> approach > >> >> >> > that > >> >> >> > > I > >> >> >> > > >>>> was > >> >> >> > > >>>> >> just > >> >> >> > > >>>> >> > > >> thinking about (and did not think too far so it > >> may > >> >> not > >> >> >> > > >>>> work): do > >> >> >> > > >>>> >> not > >> >> >> > > >>>> >> > > >> add the LogAppendTime to messages. Instead, > build > >> the > >> >> >> > > >>>> time-based > >> >> >> > > >>>> >> index > >> >> >> > > >>>> >> > > >> on the server side on message arrival time > alone. > >> >> >> > Introduce > >> >> >> > > a > >> >> >> > > >>>> new > >> >> >> > > >>>> >> > > >> ReplicaFetchRequest/Response pair. > >> >> ReplicaFetchResponses > >> >> >> > > will > >> >> >> > > >>>> also > >> >> >> > > >>>> >> > > >> include the slice of the time-based index for > the > >> >> >> follower > >> >> >> > > >>>> broker. > >> >> >> > > >>>> >> > > >> This way we can at least keep timestamps > aligned > >> >> across > >> >> >> > > >>>> brokers for > >> >> >> > > >>>> >> > > >> retention purposes. We do lose the append > >> timestamp > >> >> for > >> >> >> > > >>>> mirroring > >> >> >> > > >>>> >> > > >> pipelines (which appears to be the case in > KIP-32 > >> as > >> >> >> > well). > >> >> >> > > >>>> >> > > >> > >> >> >> > > >>>> >> > > >> Configurable index granularity: We can do this > but > >> >> I’m > >> >> >> not > >> >> >> > > >>>> sure it > >> >> >> > > >>>> >> is > >> >> >> > > >>>> >> > > >> very useful and as Jay noted, a major change > from > >> the > >> >> >> old > >> >> >> > > >>>> proposal > >> >> >> > > >>>> >> > > >> linked from the KIP is the sparse time-based > index > >> >> which > >> >> >> > we > >> >> >> > > >>>> felt was > >> >> >> > > >>>> >> > > >> essential to bound memory usage (and having > >> >> timestamps > >> >> >> on > >> >> >> > > >>>> each log > >> >> >> > > >>>> >> > > >> index entry was probably a big waste since in > the > >> >> common > >> >> >> > > case > >> >> >> > > >>>> >> several > >> >> >> > > >>>> >> > > >> messages span the same timestamp). BTW another > >> >> benefit > >> >> >> of > >> >> >> > > the > >> >> >> > > >>>> second > >> >> >> > > >>>> >> > > >> index is that it makes it easier to roll-back > or > >> >> throw > >> >> >> > away > >> >> >> > > if > >> >> >> > > >>>> >> > > >> necessary (vs. modifying the existing index > >> format) - > >> >> >> > > >>>> although that > >> >> >> > > >>>> >> > > >> obviously does not help with rolling back the > >> >> timestamp > >> >> >> > > >>>> change in > >> >> >> > > >>>> >> the > >> >> >> > > >>>> >> > > >> message format, but it is one less thing to > worry > >> >> about. > >> >> >> > > >>>> >> > > >> > >> >> >> > > >>>> >> > > >> Versioning: I’m not sure everyone is saying the > >> same > >> >> >> thing > >> >> >> > > >>>> wrt the > >> >> >> > > >>>> >> > > >> scope of this. There is the record format > change, > >> >> but I > >> >> >> > also > >> >> >> > > >>>> think > >> >> >> > > >>>> >> > > >> this ties into all of the API versioning that > we > >> >> already > >> >> >> > > have > >> >> >> > > >>>> in > >> >> >> > > >>>> >> > > >> Kafka. The current API versioning approach > works > >> fine > >> >> >> for > >> >> >> > > >>>> >> > > >> upgrades/downgrades across official Kafka > >> releases, > >> >> but > >> >> >> > not > >> >> >> > > >>>> so well > >> >> >> > > >>>> >> > > >> between releases. (We almost got bitten by > this at > >> >> >> > LinkedIn > >> >> >> > > >>>> with the > >> >> >> > > >>>> >> > > >> recent changes to various requests but were > able > >> to > >> >> work > >> >> >> > > >>>> around > >> >> >> > > >>>> >> > > >> these.) We can clarify this in the follow-up > KIP. > >> >> >> > > >>>> >> > > >> > >> >> >> > > >>>> >> > > >> Thanks, > >> >> >> > > >>>> >> > > >> > >> >> >> > > >>>> >> > > >> Joel > >> >> >> > > >>>> >> > > >> > >> >> >> > > >>>> >> > > >> > >> >> >> > > >>>> >> > > >> On Thu, Sep 10, 2015 at 3:00 PM, Jiangjie Qin > >> >> >> > > >>>> >> > <j...@linkedin.com.invalid > >> >> >> > > >>>> >> > > > > >> >> >> > > >>>> >> > > >> wrote: > >> >> >> > > >>>> >> > > >> > Hi Jay, > >> >> >> > > >>>> >> > > >> > > >> >> >> > > >>>> >> > > >> > I just changed the KIP title and updated the > KIP > >> >> page. > >> >> >> > > >>>> >> > > >> > > >> >> >> > > >>>> >> > > >> > And yes, we are working on a general version > >> >> control > >> >> >> > > >>>> proposal to > >> >> >> > > >>>> >> > make > >> >> >> > > >>>> >> > > the > >> >> >> > > >>>> >> > > >> > protocol migration like this more smooth. I > will > >> >> also > >> >> >> > > >>>> create a KIP > >> >> >> > > >>>> >> > for > >> >> >> > > >>>> >> > > >> that > >> >> >> > > >>>> >> > > >> > soon. > >> >> >> > > >>>> >> > > >> > > >> >> >> > > >>>> >> > > >> > Thanks, > >> >> >> > > >>>> >> > > >> > > >> >> >> > > >>>> >> > > >> > Jiangjie (Becket) Qin > >> >> >> > > >>>> >> > > >> > > >> >> >> > > >>>> >> > > >> > > >> >> >> > > >>>> >> > > >> > On Thu, Sep 10, 2015 at 2:21 PM, Jay Kreps < > >> >> >> > > >>>> j...@confluent.io> > >> >> >> > > >>>> >> > wrote: > >> >> >> > > >>>> >> > > >> > > >> >> >> > > >>>> >> > > >> >> Great, can we change the name to something > >> >> related to > >> >> >> > the > >> >> >> > > >>>> >> > > >> change--"KIP-31: > >> >> >> > > >>>> >> > > >> >> Move to relative offsets in compressed > message > >> >> sets". > >> >> >> > > >>>> >> > > >> >> > >> >> >> > > >>>> >> > > >> >> Also you had mentioned before you were > going to > >> >> >> expand > >> >> >> > on > >> >> >> > > >>>> the > >> >> >> > > >>>> >> > > mechanics > >> >> >> > > >>>> >> > > >> of > >> >> >> > > >>>> >> > > >> >> handling these log format changes, right? > >> >> >> > > >>>> >> > > >> >> > >> >> >> > > >>>> >> > > >> >> -Jay > >> >> >> > > >>>> >> > > >> >> > >> >> >> > > >>>> >> > > >> >> On Thu, Sep 10, 2015 at 12:42 PM, Jiangjie > Qin > >> >> >> > > >>>> >> > > >> <j...@linkedin.com.invalid> > >> >> >> > > >>>> >> > > >> >> wrote: > >> >> >> > > >>>> >> > > >> >> > >> >> >> > > >>>> >> > > >> >> > Neha and Jay, > >> >> >> > > >>>> >> > > >> >> > > >> >> >> > > >>>> >> > > >> >> > Thanks a lot for the feedback. Good point > >> about > >> >> >> > > >>>> splitting the > >> >> >> > > >>>> >> > > >> >> discussion. I > >> >> >> > > >>>> >> > > >> >> > have split the proposal to three KIPs and > it > >> >> does > >> >> >> > make > >> >> >> > > >>>> each > >> >> >> > > >>>> >> > > discussion > >> >> >> > > >>>> >> > > >> >> more > >> >> >> > > >>>> >> > > >> >> > clear: > >> >> >> > > >>>> >> > > >> >> > KIP-31 - Message format change (Use > relative > >> >> >> offset) > >> >> >> > > >>>> >> > > >> >> > KIP-32 - Add CreateTime and LogAppendTime > to > >> >> Kafka > >> >> >> > > >>>> message > >> >> >> > > >>>> >> > > >> >> > KIP-33 - Build a time-based log index > >> >> >> > > >>>> >> > > >> >> > > >> >> >> > > >>>> >> > > >> >> > KIP-33 can be a follow up KIP for KIP-32, > so > >> we > >> >> can > >> >> >> > > >>>> discuss > >> >> >> > > >>>> >> about > >> >> >> > > >>>> >> > > >> KIP-31 > >> >> >> > > >>>> >> > > >> >> > and KIP-32 first for now. I will create a > >> >> separate > >> >> >> > > >>>> discussion > >> >> >> > > >>>> >> > > thread > >> >> >> > > >>>> >> > > >> for > >> >> >> > > >>>> >> > > >> >> > KIP-32 and reply the concerns you raised > >> >> regarding > >> >> >> > the > >> >> >> > > >>>> >> timestamp. > >> >> >> > > >>>> >> > > >> >> > > >> >> >> > > >>>> >> > > >> >> > So far it looks there is no objection to > >> KIP-31. > >> >> >> > Since > >> >> >> > > I > >> >> >> > > >>>> >> removed > >> >> >> > > >>>> >> > a > >> >> >> > > >>>> >> > > few > >> >> >> > > >>>> >> > > >> >> part > >> >> >> > > >>>> >> > > >> >> > from previous KIP and only left the > relative > >> >> offset > >> >> >> > > >>>> proposal, > >> >> >> > > >>>> >> it > >> >> >> > > >>>> >> > > >> would be > >> >> >> > > >>>> >> > > >> >> > great if people can take another look to > see > >> if > >> >> >> there > >> >> >> > > is > >> >> >> > > >>>> any > >> >> >> > > >>>> >> > > concerns. > >> >> >> > > >>>> >> > > >> >> > > >> >> >> > > >>>> >> > > >> >> > Thanks, > >> >> >> > > >>>> >> > > >> >> > > >> >> >> > > >>>> >> > > >> >> > Jiangjie (Becket) Qin > >> >> >> > > >>>> >> > > >> >> > > >> >> >> > > >>>> >> > > >> >> > > >> >> >> > > >>>> >> > > >> >> > On Tue, Sep 8, 2015 at 1:28 PM, Neha > >> Narkhede < > >> >> >> > > >>>> >> n...@confluent.io > >> >> >> > > >>>> >> > > > >> >> >> > > >>>> >> > > >> wrote: > >> >> >> > > >>>> >> > > >> >> > > >> >> >> > > >>>> >> > > >> >> > > Becket, > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > Nice write-up. Few thoughts - > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > I'd split up the discussion for > simplicity. > >> >> Note > >> >> >> > that > >> >> >> > > >>>> you can > >> >> >> > > >>>> >> > > always > >> >> >> > > >>>> >> > > >> >> > group > >> >> >> > > >>>> >> > > >> >> > > several of these in one patch to reduce > the > >> >> >> > protocol > >> >> >> > > >>>> changes > >> >> >> > > >>>> >> > > people > >> >> >> > > >>>> >> > > >> >> have > >> >> >> > > >>>> >> > > >> >> > to > >> >> >> > > >>>> >> > > >> >> > > deal with.This is just a suggestion, > but I > >> >> think > >> >> >> > the > >> >> >> > > >>>> >> following > >> >> >> > > >>>> >> > > split > >> >> >> > > >>>> >> > > >> >> > might > >> >> >> > > >>>> >> > > >> >> > > make it easier to tackle the changes > being > >> >> >> > proposed - > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > - Relative offsets > >> >> >> > > >>>> >> > > >> >> > > - Introducing the concept of time > >> >> >> > > >>>> >> > > >> >> > > - Time-based indexing (separate the > >> usage > >> >> of > >> >> >> the > >> >> >> > > >>>> timestamp > >> >> >> > > >>>> >> > > field > >> >> >> > > >>>> >> > > >> >> from > >> >> >> > > >>>> >> > > >> >> > > how/whether we want to include a > >> timestamp > >> >> in > >> >> >> > the > >> >> >> > > >>>> message) > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > I'm a +1 on relative offsets, we > should've > >> >> done > >> >> >> it > >> >> >> > > >>>> back when > >> >> >> > > >>>> >> we > >> >> >> > > >>>> >> > > >> >> > introduced > >> >> >> > > >>>> >> > > >> >> > > it. Other than reducing the CPU > overhead, > >> this > >> >> >> will > >> >> >> > > >>>> also > >> >> >> > > >>>> >> reduce > >> >> >> > > >>>> >> > > the > >> >> >> > > >>>> >> > > >> >> > garbage > >> >> >> > > >>>> >> > > >> >> > > collection overhead on the brokers. > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > On the timestamp field, I generally > agree > >> >> that we > >> >> >> > > >>>> should add > >> >> >> > > >>>> >> a > >> >> >> > > >>>> >> > > >> >> timestamp > >> >> >> > > >>>> >> > > >> >> > to > >> >> >> > > >>>> >> > > >> >> > > a Kafka message but I'm not quite sold > on > >> how > >> >> >> this > >> >> >> > > KIP > >> >> >> > > >>>> >> suggests > >> >> >> > > >>>> >> > > the > >> >> >> > > >>>> >> > > >> >> > > timestamp be set. Will avoid repeating > the > >> >> >> > downsides > >> >> >> > > >>>> of a > >> >> >> > > >>>> >> > broker > >> >> >> > > >>>> >> > > >> side > >> >> >> > > >>>> >> > > >> >> > > timestamp mentioned previously in this > >> >> thread. I > >> >> >> > > think > >> >> >> > > >>>> the > >> >> >> > > >>>> >> > topic > >> >> >> > > >>>> >> > > of > >> >> >> > > >>>> >> > > >> >> > > including a timestamp in a Kafka message > >> >> >> requires a > >> >> >> > > >>>> lot more > >> >> >> > > >>>> >> > > thought > >> >> >> > > >>>> >> > > >> >> and > >> >> >> > > >>>> >> > > >> >> > > details than what's in this KIP. I'd > >> suggest > >> >> we > >> >> >> > make > >> >> >> > > >>>> it a > >> >> >> > > >>>> >> > > separate > >> >> >> > > >>>> >> > > >> KIP > >> >> >> > > >>>> >> > > >> >> > that > >> >> >> > > >>>> >> > > >> >> > > includes a list of all the different use > >> cases > >> >> >> for > >> >> >> > > the > >> >> >> > > >>>> >> > timestamp > >> >> >> > > >>>> >> > > >> >> (beyond > >> >> >> > > >>>> >> > > >> >> > > log retention) including stream > processing > >> and > >> >> >> > > discuss > >> >> >> > > >>>> >> > tradeoffs > >> >> >> > > >>>> >> > > of > >> >> >> > > >>>> >> > > >> >> > > including client and broker side > >> timestamps. > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > Agree with the benefit of time-based > >> indexing, > >> >> >> but > >> >> >> > > >>>> haven't > >> >> >> > > >>>> >> had > >> >> >> > > >>>> >> > a > >> >> >> > > >>>> >> > > >> chance > >> >> >> > > >>>> >> > > >> >> > to > >> >> >> > > >>>> >> > > >> >> > > dive into the design details yet. > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > Thanks, > >> >> >> > > >>>> >> > > >> >> > > Neha > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > On Tue, Sep 8, 2015 at 10:57 AM, Jay > Kreps > >> < > >> >> >> > > >>>> j...@confluent.io > >> >> >> > > >>>> >> > > >> >> >> > > >>>> >> > > >> wrote: > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > > Hey Beckett, > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > I was proposing splitting up the KIP > just > >> >> for > >> >> >> > > >>>> simplicity of > >> >> >> > > >>>> >> > > >> >> discussion. > >> >> >> > > >>>> >> > > >> >> > > You > >> >> >> > > >>>> >> > > >> >> > > > can still implement them in one > patch. I > >> >> think > >> >> >> > > >>>> otherwise it > >> >> >> > > >>>> >> > > will > >> >> >> > > >>>> >> > > >> be > >> >> >> > > >>>> >> > > >> >> > hard > >> >> >> > > >>>> >> > > >> >> > > to > >> >> >> > > >>>> >> > > >> >> > > > discuss/vote on them since if you like > >> the > >> >> >> offset > >> >> >> > > >>>> proposal > >> >> >> > > >>>> >> > but > >> >> >> > > >>>> >> > > not > >> >> >> > > >>>> >> > > >> >> the > >> >> >> > > >>>> >> > > >> >> > > time > >> >> >> > > >>>> >> > > >> >> > > > proposal what do you do? > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > Introducing a second notion of time > into > >> >> Kafka > >> >> >> > is a > >> >> >> > > >>>> pretty > >> >> >> > > >>>> >> > > massive > >> >> >> > > >>>> >> > > >> >> > > > philosophical change so it kind of > >> warrants > >> >> >> it's > >> >> >> > > own > >> >> >> > > >>>> KIP I > >> >> >> > > >>>> >> > > think > >> >> >> > > >>>> >> > > >> it > >> >> >> > > >>>> >> > > >> >> > isn't > >> >> >> > > >>>> >> > > >> >> > > > just "Change message format". > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > WRT time I think one thing to clarify > in > >> the > >> >> >> > > >>>> proposal is > >> >> >> > > >>>> >> how > >> >> >> > > >>>> >> > MM > >> >> >> > > >>>> >> > > >> will > >> >> >> > > >>>> >> > > >> >> > have > >> >> >> > > >>>> >> > > >> >> > > > access to set the timestamp? > Presumably > >> this > >> >> >> will > >> >> >> > > be > >> >> >> > > >>>> a new > >> >> >> > > >>>> >> > > field > >> >> >> > > >>>> >> > > >> in > >> >> >> > > >>>> >> > > >> >> > > > ProducerRecord, right? If so then any > >> user > >> >> can > >> >> >> > set > >> >> >> > > >>>> the > >> >> >> > > >>>> >> > > timestamp, > >> >> >> > > >>>> >> > > >> >> > right? > >> >> >> > > >>>> >> > > >> >> > > > I'm not sure you answered the > questions > >> >> around > >> >> >> > how > >> >> >> > > >>>> this > >> >> >> > > >>>> >> will > >> >> >> > > >>>> >> > > work > >> >> >> > > >>>> >> > > >> for > >> >> >> > > >>>> >> > > >> >> > MM > >> >> >> > > >>>> >> > > >> >> > > > since when MM retains timestamps from > >> >> multiple > >> >> >> > > >>>> partitions > >> >> >> > > >>>> >> > they > >> >> >> > > >>>> >> > > >> will > >> >> >> > > >>>> >> > > >> >> > then > >> >> >> > > >>>> >> > > >> >> > > be > >> >> >> > > >>>> >> > > >> >> > > > out of order and in the past (so the > >> >> >> > > >>>> >> > max(lastAppendedTimestamp, > >> >> >> > > >>>> >> > > >> >> > > > currentTimeMillis) override you > proposed > >> >> will > >> >> >> not > >> >> >> > > >>>> work, > >> >> >> > > >>>> >> > > right?). > >> >> >> > > >>>> >> > > >> If > >> >> >> > > >>>> >> > > >> >> we > >> >> >> > > >>>> >> > > >> >> > > > don't do this then when you set up > >> mirroring > >> >> >> the > >> >> >> > > >>>> data will > >> >> >> > > >>>> >> > all > >> >> >> > > >>>> >> > > be > >> >> >> > > >>>> >> > > >> new > >> >> >> > > >>>> >> > > >> >> > and > >> >> >> > > >>>> >> > > >> >> > > > you have the same retention problem > you > >> >> >> > described. > >> >> >> > > >>>> Maybe I > >> >> >> > > >>>> >> > > missed > >> >> >> > > >>>> >> > > >> >> > > > something...? > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > My main motivation is that given that > >> both > >> >> >> Samza > >> >> >> > > and > >> >> >> > > >>>> Kafka > >> >> >> > > >>>> >> > > streams > >> >> >> > > >>>> >> > > >> >> are > >> >> >> > > >>>> >> > > >> >> > > > doing work that implies a mandatory > >> >> >> > client-defined > >> >> >> > > >>>> notion > >> >> >> > > >>>> >> of > >> >> >> > > >>>> >> > > >> time, I > >> >> >> > > >>>> >> > > >> >> > > really > >> >> >> > > >>>> >> > > >> >> > > > think introducing a different > mandatory > >> >> notion > >> >> >> of > >> >> >> > > >>>> time in > >> >> >> > > >>>> >> > > Kafka is > >> >> >> > > >>>> >> > > >> >> > going > >> >> >> > > >>>> >> > > >> >> > > to > >> >> >> > > >>>> >> > > >> >> > > > be quite odd. We should think hard > about > >> how > >> >> >> > > >>>> client-defined > >> >> >> > > >>>> >> > > time > >> >> >> > > >>>> >> > > >> >> could > >> >> >> > > >>>> >> > > >> >> > > > work. I'm not sure if it can, but I'm > >> also > >> >> not > >> >> >> > sure > >> >> >> > > >>>> that it > >> >> >> > > >>>> >> > > can't. > >> >> >> > > >>>> >> > > >> >> > Having > >> >> >> > > >>>> >> > > >> >> > > > both will be odd. Did you chat about > this > >> >> with > >> >> >> > > >>>> Yi/Kartik on > >> >> >> > > >>>> >> > the > >> >> >> > > >>>> >> > > >> Samza > >> >> >> > > >>>> >> > > >> >> > > side? > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > When you are saying it won't work you > are > >> >> >> > assuming > >> >> >> > > >>>> some > >> >> >> > > >>>> >> > > particular > >> >> >> > > >>>> >> > > >> >> > > > implementation? Maybe that the index > is a > >> >> >> > > >>>> monotonically > >> >> >> > > >>>> >> > > increasing > >> >> >> > > >>>> >> > > >> >> set > >> >> >> > > >>>> >> > > >> >> > of > >> >> >> > > >>>> >> > > >> >> > > > pointers to the least record with a > >> >> timestamp > >> >> >> > > larger > >> >> >> > > >>>> than > >> >> >> > > >>>> >> the > >> >> >> > > >>>> >> > > >> index > >> >> >> > > >>>> >> > > >> >> > time? > >> >> >> > > >>>> >> > > >> >> > > > In other words a search for time X > gives > >> the > >> >> >> > > largest > >> >> >> > > >>>> offset > >> >> >> > > >>>> >> > at > >> >> >> > > >>>> >> > > >> which > >> >> >> > > >>>> >> > > >> >> > all > >> >> >> > > >>>> >> > > >> >> > > > records are <= X? > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > For retention, I agree with the > problem > >> you > >> >> >> point > >> >> >> > > >>>> out, but > >> >> >> > > >>>> >> I > >> >> >> > > >>>> >> > > think > >> >> >> > > >>>> >> > > >> >> what > >> >> >> > > >>>> >> > > >> >> > > you > >> >> >> > > >>>> >> > > >> >> > > > are saying in that case is that you > want > >> a > >> >> size > >> >> >> > > >>>> limit too. > >> >> >> > > >>>> >> If > >> >> >> > > >>>> >> > > you > >> >> >> > > >>>> >> > > >> use > >> >> >> > > >>>> >> > > >> >> > > > system time you actually hit the same > >> >> problem: > >> >> >> > say > >> >> >> > > >>>> you do a > >> >> >> > > >>>> >> > > full > >> >> >> > > >>>> >> > > >> dump > >> >> >> > > >>>> >> > > >> >> > of > >> >> >> > > >>>> >> > > >> >> > > a > >> >> >> > > >>>> >> > > >> >> > > > DB table with a setting of 7 days > >> retention, > >> >> >> your > >> >> >> > > >>>> retention > >> >> >> > > >>>> >> > > will > >> >> >> > > >>>> >> > > >> >> > actually > >> >> >> > > >>>> >> > > >> >> > > > not get enforced for the first 7 days > >> >> because > >> >> >> the > >> >> >> > > >>>> data is > >> >> >> > > >>>> >> > "new > >> >> >> > > >>>> >> > > to > >> >> >> > > >>>> >> > > >> >> > Kafka". > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > -Jay > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > On Mon, Sep 7, 2015 at 10:44 AM, > Jiangjie > >> >> Qin > >> >> >> > > >>>> >> > > >> >> > <j...@linkedin.com.invalid > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > wrote: > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > > Jay, > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > Thanks for the comments. Yes, there > are > >> >> >> > actually > >> >> >> > > >>>> three > >> >> >> > > >>>> >> > > >> proposals as > >> >> >> > > >>>> >> > > >> >> > you > >> >> >> > > >>>> >> > > >> >> > > > > pointed out. > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > We will have a separate proposal for > >> (1) - > >> >> >> > > version > >> >> >> > > >>>> >> control > >> >> >> > > >>>> >> > > >> >> mechanism. > >> >> >> > > >>>> >> > > >> >> > > We > >> >> >> > > >>>> >> > > >> >> > > > > actually thought about whether we > want > >> to > >> >> >> > > separate > >> >> >> > > >>>> 2 and > >> >> >> > > >>>> >> 3 > >> >> >> > > >>>> >> > > >> >> internally > >> >> >> > > >>>> >> > > >> >> > > > > before creating the KIP. The reason > we > >> >> put 2 > >> >> >> > and > >> >> >> > > 3 > >> >> >> > > >>>> >> together > >> >> >> > > >>>> >> > > is > >> >> >> > > >>>> >> > > >> it > >> >> >> > > >>>> >> > > >> >> > will > >> >> >> > > >>>> >> > > >> >> > > > > saves us another cross board wire > >> protocol > >> >> >> > > change. > >> >> >> > > >>>> Like > >> >> >> > > >>>> >> you > >> >> >> > > >>>> >> > > >> said, > >> >> >> > > >>>> >> > > >> >> we > >> >> >> > > >>>> >> > > >> >> > > have > >> >> >> > > >>>> >> > > >> >> > > > > to migrate all the clients in all > >> >> languages. > >> >> >> To > >> >> >> > > >>>> some > >> >> >> > > >>>> >> > extent, > >> >> >> > > >>>> >> > > the > >> >> >> > > >>>> >> > > >> >> > effort > >> >> >> > > >>>> >> > > >> >> > > > to > >> >> >> > > >>>> >> > > >> >> > > > > spend on upgrading the clients can > be > >> even > >> >> >> > bigger > >> >> >> > > >>>> than > >> >> >> > > >>>> >> > > >> implementing > >> >> >> > > >>>> >> > > >> >> > the > >> >> >> > > >>>> >> > > >> >> > > > new > >> >> >> > > >>>> >> > > >> >> > > > > feature itself. So there are some > >> >> attractions > >> >> >> > if > >> >> >> > > >>>> we can > >> >> >> > > >>>> >> do > >> >> >> > > >>>> >> > 2 > >> >> >> > > >>>> >> > > >> and 3 > >> >> >> > > >>>> >> > > >> >> > > > together > >> >> >> > > >>>> >> > > >> >> > > > > instead of separately. Maybe after > (1) > >> is > >> >> >> done > >> >> >> > it > >> >> >> > > >>>> will be > >> >> >> > > >>>> >> > > >> easier to > >> >> >> > > >>>> >> > > >> >> > do > >> >> >> > > >>>> >> > > >> >> > > > > protocol migration. But if we are > able > >> to > >> >> >> come > >> >> >> > to > >> >> >> > > >>>> an > >> >> >> > > >>>> >> > > agreement > >> >> >> > > >>>> >> > > >> on > >> >> >> > > >>>> >> > > >> >> the > >> >> >> > > >>>> >> > > >> >> > > > > timestamp solution, I would prefer > to > >> >> have it > >> >> >> > > >>>> together > >> >> >> > > >>>> >> with > >> >> >> > > >>>> >> > > >> >> relative > >> >> >> > > >>>> >> > > >> >> > > > offset > >> >> >> > > >>>> >> > > >> >> > > > > in the interest of avoiding another > >> wire > >> >> >> > protocol > >> >> >> > > >>>> change > >> >> >> > > >>>> >> > (the > >> >> >> > > >>>> >> > > >> >> process > >> >> >> > > >>>> >> > > >> >> > > to > >> >> >> > > >>>> >> > > >> >> > > > > migrate to relative offset is > exactly > >> the > >> >> >> same > >> >> >> > as > >> >> >> > > >>>> migrate > >> >> >> > > >>>> >> > to > >> >> >> > > >>>> >> > > >> >> message > >> >> >> > > >>>> >> > > >> >> > > with > >> >> >> > > >>>> >> > > >> >> > > > > timestamp). > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > In terms of timestamp. I completely > >> agree > >> >> >> that > >> >> >> > > >>>> having > >> >> >> > > >>>> >> > client > >> >> >> > > >>>> >> > > >> >> > timestamp > >> >> >> > > >>>> >> > > >> >> > > is > >> >> >> > > >>>> >> > > >> >> > > > > more useful if we can make sure the > >> >> timestamp > >> >> >> > is > >> >> >> > > >>>> good. > >> >> >> > > >>>> >> But > >> >> >> > > >>>> >> > in > >> >> >> > > >>>> >> > > >> >> reality > >> >> >> > > >>>> >> > > >> >> > > > that > >> >> >> > > >>>> >> > > >> >> > > > > can be a really big *IF*. I think > the > >> >> problem > >> >> >> > is > >> >> >> > > >>>> exactly > >> >> >> > > >>>> >> as > >> >> >> > > >>>> >> > > Ewen > >> >> >> > > >>>> >> > > >> >> > > > mentioned, > >> >> >> > > >>>> >> > > >> >> > > > > if we let the client to set the > >> >> timestamp, it > >> >> >> > > >>>> would be > >> >> >> > > >>>> >> very > >> >> >> > > >>>> >> > > hard > >> >> >> > > >>>> >> > > >> >> for > >> >> >> > > >>>> >> > > >> >> > > the > >> >> >> > > >>>> >> > > >> >> > > > > broker to utilize it. If broker > apply > >> >> >> retention > >> >> >> > > >>>> policy > >> >> >> > > >>>> >> > based > >> >> >> > > >>>> >> > > on > >> >> >> > > >>>> >> > > >> the > >> >> >> > > >>>> >> > > >> >> > > > client > >> >> >> > > >>>> >> > > >> >> > > > > timestamp. One misbehave producer > can > >> >> >> > potentially > >> >> >> > > >>>> >> > completely > >> >> >> > > >>>> >> > > >> mess > >> >> >> > > >>>> >> > > >> >> up > >> >> >> > > >>>> >> > > >> >> > > the > >> >> >> > > >>>> >> > > >> >> > > > > retention policy on the broker. > >> Although > >> >> >> people > >> >> >> > > >>>> don't > >> >> >> > > >>>> >> care > >> >> >> > > >>>> >> > > about > >> >> >> > > >>>> >> > > >> >> > server > >> >> >> > > >>>> >> > > >> >> > > > > side timestamp. People do care a lot > >> when > >> >> >> > > timestamp > >> >> >> > > >>>> >> breaks. > >> >> >> > > >>>> >> > > >> >> Searching > >> >> >> > > >>>> >> > > >> >> > > by > >> >> >> > > >>>> >> > > >> >> > > > > timestamp is a really important use > >> case > >> >> even > >> >> >> > > >>>> though it > >> >> >> > > >>>> >> is > >> >> >> > > >>>> >> > > not > >> >> >> > > >>>> >> > > >> used > >> >> >> > > >>>> >> > > >> >> > as > >> >> >> > > >>>> >> > > >> >> > > > > often as searching by offset. It has > >> >> >> > significant > >> >> >> > > >>>> direct > >> >> >> > > >>>> >> > > impact > >> >> >> > > >>>> >> > > >> on > >> >> >> > > >>>> >> > > >> >> RTO > >> >> >> > > >>>> >> > > >> >> > > > when > >> >> >> > > >>>> >> > > >> >> > > > > there is a cross cluster failover as > >> Todd > >> >> >> > > >>>> mentioned. > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > The trick using > >> max(lastAppendedTimestamp, > >> >> >> > > >>>> >> > currentTimeMillis) > >> >> >> > > >>>> >> > > >> is to > >> >> >> > > >>>> >> > > >> >> > > > > guarantee monotonic increase of the > >> >> >> timestamp. > >> >> >> > > Many > >> >> >> > > >>>> >> > > commercial > >> >> >> > > >>>> >> > > >> >> system > >> >> >> > > >>>> >> > > >> >> > > > > actually do something similar to > this > >> to > >> >> >> solve > >> >> >> > > the > >> >> >> > > >>>> time > >> >> >> > > >>>> >> > skew. > >> >> >> > > >>>> >> > > >> About > >> >> >> > > >>>> >> > > >> >> > > > > changing the time, I am not sure if > >> people > >> >> >> use > >> >> >> > > NTP > >> >> >> > > >>>> like > >> >> >> > > >>>> >> > > using a > >> >> >> > > >>>> >> > > >> >> watch > >> >> >> > > >>>> >> > > >> >> > > to > >> >> >> > > >>>> >> > > >> >> > > > > just set it forward/backward by an > >> hour or > >> >> >> so. > >> >> >> > > The > >> >> >> > > >>>> time > >> >> >> > > >>>> >> > > >> adjustment > >> >> >> > > >>>> >> > > >> >> I > >> >> >> > > >>>> >> > > >> >> > > used > >> >> >> > > >>>> >> > > >> >> > > > > to do is typically to adjust > something > >> >> like a > >> >> >> > > >>>> minute / > >> >> >> > > >>>> >> > > week. So > >> >> >> > > >>>> >> > > >> >> for > >> >> >> > > >>>> >> > > >> >> > > each > >> >> >> > > >>>> >> > > >> >> > > > > second, there might be a few > >> mircoseconds > >> >> >> > > >>>> slower/faster > >> >> >> > > >>>> >> but > >> >> >> > > >>>> >> > > >> should > >> >> >> > > >>>> >> > > >> >> > not > >> >> >> > > >>>> >> > > >> >> > > > > break the clock completely to make > sure > >> >> all > >> >> >> the > >> >> >> > > >>>> >> time-based > >> >> >> > > >>>> >> > > >> >> > transactions > >> >> >> > > >>>> >> > > >> >> > > > are > >> >> >> > > >>>> >> > > >> >> > > > > not affected. The one minute change > >> will > >> >> be > >> >> >> > done > >> >> >> > > >>>> within a > >> >> >> > > >>>> >> > > week > >> >> >> > > >>>> >> > > >> but > >> >> >> > > >>>> >> > > >> >> > not > >> >> >> > > >>>> >> > > >> >> > > > > instantly. > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > Personally, I think having client > side > >> >> >> > timestamp > >> >> >> > > >>>> will be > >> >> >> > > >>>> >> > > useful > >> >> >> > > >>>> >> > > >> if > >> >> >> > > >>>> >> > > >> >> we > >> >> >> > > >>>> >> > > >> >> > > > don't > >> >> >> > > >>>> >> > > >> >> > > > > need to put the broker and data > >> integrity > >> >> >> under > >> >> >> > > >>>> risk. If > >> >> >> > > >>>> >> we > >> >> >> > > >>>> >> > > >> have to > >> >> >> > > >>>> >> > > >> >> > > > choose > >> >> >> > > >>>> >> > > >> >> > > > > from one of them but not both. I > would > >> >> prefer > >> >> >> > > >>>> server side > >> >> >> > > >>>> >> > > >> timestamp > >> >> >> > > >>>> >> > > >> >> > > > because > >> >> >> > > >>>> >> > > >> >> > > > > for client side timestamp there is > >> always > >> >> a > >> >> >> > plan > >> >> >> > > B > >> >> >> > > >>>> which > >> >> >> > > >>>> >> is > >> >> >> > > >>>> >> > > >> putting > >> >> >> > > >>>> >> > > >> >> > the > >> >> >> > > >>>> >> > > >> >> > > > > timestamp into payload. > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > Another reason I am reluctant to use > >> the > >> >> >> client > >> >> >> > > >>>> side > >> >> >> > > >>>> >> > > timestamp > >> >> >> > > >>>> >> > > >> is > >> >> >> > > >>>> >> > > >> >> > that > >> >> >> > > >>>> >> > > >> >> > > it > >> >> >> > > >>>> >> > > >> >> > > > > is always dangerous to mix the > control > >> >> plane > >> >> >> > with > >> >> >> > > >>>> data > >> >> >> > > >>>> >> > > plane. IP > >> >> >> > > >>>> >> > > >> >> did > >> >> >> > > >>>> >> > > >> >> > > this > >> >> >> > > >>>> >> > > >> >> > > > > and it has caused so many different > >> >> breaches > >> >> >> so > >> >> >> > > >>>> people > >> >> >> > > >>>> >> are > >> >> >> > > >>>> >> > > >> >> migrating > >> >> >> > > >>>> >> > > >> >> > to > >> >> >> > > >>>> >> > > >> >> > > > > something like MPLS. An example in > >> Kafka > >> >> is > >> >> >> > that > >> >> >> > > >>>> any > >> >> >> > > >>>> >> client > >> >> >> > > >>>> >> > > can > >> >> >> > > >>>> >> > > >> >> > > > construct a > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> > >> >> >> > > >>>> > >> >> >> > LeaderAndIsrRequest/UpdateMetadataRequest/ContorlledShutdownRequest > >> >> >> > > >>>> >> > > >> >> > > (you > >> >> >> > > >>>> >> > > >> >> > > > > name it) and send it to the broker > to > >> >> mess up > >> >> >> > the > >> >> >> > > >>>> entire > >> >> >> > > >>>> >> > > >> cluster, > >> >> >> > > >>>> >> > > >> >> > also > >> >> >> > > >>>> >> > > >> >> > > as > >> >> >> > > >>>> >> > > >> >> > > > > we already noticed a busy cluster > can > >> >> respond > >> >> >> > > >>>> quite slow > >> >> >> > > >>>> >> to > >> >> >> > > >>>> >> > > >> >> > controller > >> >> >> > > >>>> >> > > >> >> > > > > messages. So it would really be nice > >> if we > >> >> >> can > >> >> >> > > >>>> avoid > >> >> >> > > >>>> >> giving > >> >> >> > > >>>> >> > > the > >> >> >> > > >>>> >> > > >> >> power > >> >> >> > > >>>> >> > > >> >> > > to > >> >> >> > > >>>> >> > > >> >> > > > > clients to control the log > retention. > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > Thanks, > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > Jiangjie (Becket) Qin > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > On Sun, Sep 6, 2015 at 9:54 PM, Todd > >> >> Palino < > >> >> >> > > >>>> >> > > tpal...@gmail.com> > >> >> >> > > >>>> >> > > >> >> > wrote: > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > So, with regards to why you want > to > >> >> search > >> >> >> by > >> >> >> > > >>>> >> timestamp, > >> >> >> > > >>>> >> > > the > >> >> >> > > >>>> >> > > >> >> > biggest > >> >> >> > > >>>> >> > > >> >> > > > > > problem I've seen is with > consumers > >> who > >> >> >> want > >> >> >> > to > >> >> >> > > >>>> reset > >> >> >> > > >>>> >> > their > >> >> >> > > >>>> >> > > >> >> > > timestamps > >> >> >> > > >>>> >> > > >> >> > > > > to a > >> >> >> > > >>>> >> > > >> >> > > > > > specific point, whether it is to > >> replay > >> >> a > >> >> >> > > certain > >> >> >> > > >>>> >> amount > >> >> >> > > >>>> >> > of > >> >> >> > > >>>> >> > > >> >> > messages, > >> >> >> > > >>>> >> > > >> >> > > > or > >> >> >> > > >>>> >> > > >> >> > > > > to > >> >> >> > > >>>> >> > > >> >> > > > > > rewind to before some problem > state > >> >> >> existed. > >> >> >> > > This > >> >> >> > > >>>> >> happens > >> >> >> > > >>>> >> > > more > >> >> >> > > >>>> >> > > >> >> > often > >> >> >> > > >>>> >> > > >> >> > > > than > >> >> >> > > >>>> >> > > >> >> > > > > > anyone would like. > >> >> >> > > >>>> >> > > >> >> > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > To handle this now we need to > >> constantly > >> >> >> > export > >> >> >> > > >>>> the > >> >> >> > > >>>> >> > > broker's > >> >> >> > > >>>> >> > > >> >> offset > >> >> >> > > >>>> >> > > >> >> > > for > >> >> >> > > >>>> >> > > >> >> > > > > > every partition to a time-series > >> >> database > >> >> >> and > >> >> >> > > >>>> then use > >> >> >> > > >>>> >> > > >> external > >> >> >> > > >>>> >> > > >> >> > > > processes > >> >> >> > > >>>> >> > > >> >> > > > > > to query this. I know we're not > the > >> only > >> >> >> ones > >> >> >> > > >>>> doing > >> >> >> > > >>>> >> this. > >> >> >> > > >>>> >> > > The > >> >> >> > > >>>> >> > > >> way > >> >> >> > > >>>> >> > > >> >> > the > >> >> >> > > >>>> >> > > >> >> > > > > > broker handles requests for > offsets > >> by > >> >> >> > > timestamp > >> >> >> > > >>>> is a > >> >> >> > > >>>> >> > > little > >> >> >> > > >>>> >> > > >> >> obtuse > >> >> >> > > >>>> >> > > >> >> > > > > > (explain it to anyone without > >> intimate > >> >> >> > > knowledge > >> >> >> > > >>>> of the > >> >> >> > > >>>> >> > > >> internal > >> >> >> > > >>>> >> > > >> >> > > > workings > >> >> >> > > >>>> >> > > >> >> > > > > > of the broker - every time I do I > see > >> >> >> this). > >> >> >> > In > >> >> >> > > >>>> >> addition, > >> >> >> > > >>>> >> > > as > >> >> >> > > >>>> >> > > >> >> Becket > >> >> >> > > >>>> >> > > >> >> > > > > pointed > >> >> >> > > >>>> >> > > >> >> > > > > > out, it causes problems > specifically > >> >> with > >> >> >> > > >>>> retention of > >> >> >> > > >>>> >> > > >> messages > >> >> >> > > >>>> >> > > >> >> by > >> >> >> > > >>>> >> > > >> >> > > time > >> >> >> > > >>>> >> > > >> >> > > > > > when you move partitions around. > >> >> >> > > >>>> >> > > >> >> > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > I'm deliberately avoiding the > >> >> discussion of > >> >> >> > > what > >> >> >> > > >>>> >> > timestamp > >> >> >> > > >>>> >> > > to > >> >> >> > > >>>> >> > > >> >> use. > >> >> >> > > >>>> >> > > >> >> > I > >> >> >> > > >>>> >> > > >> >> > > > can > >> >> >> > > >>>> >> > > >> >> > > > > > see the argument either way, > though I > >> >> tend > >> >> >> to > >> >> >> > > >>>> lean > >> >> >> > > >>>> >> > towards > >> >> >> > > >>>> >> > > the > >> >> >> > > >>>> >> > > >> >> idea > >> >> >> > > >>>> >> > > >> >> > > > that > >> >> >> > > >>>> >> > > >> >> > > > > > the broker timestamp is the only > >> viable > >> >> >> > source > >> >> >> > > >>>> of truth > >> >> >> > > >>>> >> > in > >> >> >> > > >>>> >> > > >> this > >> >> >> > > >>>> >> > > >> >> > > > > situation. > >> >> >> > > >>>> >> > > >> >> > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > -Todd > >> >> >> > > >>>> >> > > >> >> > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > On Sun, Sep 6, 2015 at 7:08 PM, > Ewen > >> >> >> > > >>>> Cheslack-Postava < > >> >> >> > > >>>> >> > > >> >> > > > e...@confluent.io > >> >> >> > > >>>> >> > > >> >> > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > wrote: > >> >> >> > > >>>> >> > > >> >> > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > On Sun, Sep 6, 2015 at 4:57 PM, > Jay > >> >> >> Kreps < > >> >> >> > > >>>> >> > > j...@confluent.io > >> >> >> > > >>>> >> > > >> > > >> >> >> > > >>>> >> > > >> >> > > wrote: > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > 2. Nobody cares what time it > is > >> on > >> >> the > >> >> >> > > >>>> server. > >> >> >> > > >>>> >> > > >> >> > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > This is a good way of > summarizing > >> the > >> >> >> > issue I > >> >> >> > > >>>> was > >> >> >> > > >>>> >> > trying > >> >> >> > > >>>> >> > > to > >> >> >> > > >>>> >> > > >> get > >> >> >> > > >>>> >> > > >> >> > at, > >> >> >> > > >>>> >> > > >> >> > > > > from > >> >> >> > > >>>> >> > > >> >> > > > > > an > >> >> >> > > >>>> >> > > >> >> > > > > > > app's perspective. Of the 3 > stated > >> >> goals > >> >> >> of > >> >> >> > > >>>> the KIP, > >> >> >> > > >>>> >> #2 > >> >> >> > > >>>> >> > > (lot > >> >> >> > > >>>> >> > > >> >> > > > retention) > >> >> >> > > >>>> >> > > >> >> > > > > > is > >> >> >> > > >>>> >> > > >> >> > > > > > > reasonably handled by a > server-side > >> >> >> > > timestamp. > >> >> >> > > >>>> I > >> >> >> > > >>>> >> really > >> >> >> > > >>>> >> > > just > >> >> >> > > >>>> >> > > >> >> care > >> >> >> > > >>>> >> > > >> >> > > > that > >> >> >> > > >>>> >> > > >> >> > > > > a > >> >> >> > > >>>> >> > > >> >> > > > > > > message is there long enough > that I > >> >> have > >> >> >> a > >> >> >> > > >>>> chance to > >> >> >> > > >>>> >> > > process > >> >> >> > > >>>> >> > > >> >> it. > >> >> >> > > >>>> >> > > >> >> > #3 > >> >> >> > > >>>> >> > > >> >> > > > > > > (searching by timestamp) only > seems > >> >> >> useful > >> >> >> > if > >> >> >> > > >>>> we can > >> >> >> > > >>>> >> > > >> guarantee > >> >> >> > > >>>> >> > > >> >> > the > >> >> >> > > >>>> >> > > >> >> > > > > > > server-side timestamp is close > >> enough > >> >> to > >> >> >> > the > >> >> >> > > >>>> original > >> >> >> > > >>>> >> > > >> >> client-side > >> >> >> > > >>>> >> > > >> >> > > > > > > timestamp, and any mirror maker > >> step > >> >> >> seems > >> >> >> > to > >> >> >> > > >>>> break > >> >> >> > > >>>> >> > that > >> >> >> > > >>>> >> > > >> (even > >> >> >> > > >>>> >> > > >> >> > > > ignoring > >> >> >> > > >>>> >> > > >> >> > > > > > any > >> >> >> > > >>>> >> > > >> >> > > > > > > issues with broker > availability). > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > I'm also wondering whether > >> optimizing > >> >> for > >> >> >> > > >>>> >> > > >> search-by-timestamp > >> >> >> > > >>>> >> > > >> >> on > >> >> >> > > >>>> >> > > >> >> > > the > >> >> >> > > >>>> >> > > >> >> > > > > > broker > >> >> >> > > >>>> >> > > >> >> > > > > > > is really something we want to > do > >> >> given > >> >> >> > that > >> >> >> > > >>>> messages > >> >> >> > > >>>> >> > > aren't > >> >> >> > > >>>> >> > > >> >> > really > >> >> >> > > >>>> >> > > >> >> > > > > > > guaranteed to be ordered by > >> >> >> > application-level > >> >> >> > > >>>> >> > timestamps > >> >> >> > > >>>> >> > > on > >> >> >> > > >>>> >> > > >> the > >> >> >> > > >>>> >> > > >> >> > > > broker. > >> >> >> > > >>>> >> > > >> >> > > > > > Is > >> >> >> > > >>>> >> > > >> >> > > > > > > part of the need for this just > due > >> to > >> >> the > >> >> >> > > >>>> current > >> >> >> > > >>>> >> > > consumer > >> >> >> > > >>>> >> > > >> APIs > >> >> >> > > >>>> >> > > >> >> > > being > >> >> >> > > >>>> >> > > >> >> > > > > > > difficult to work with? For > >> example, > >> >> >> could > >> >> >> > > you > >> >> >> > > >>>> >> > implement > >> >> >> > > >>>> >> > > >> this > >> >> >> > > >>>> >> > > >> >> > > pretty > >> >> >> > > >>>> >> > > >> >> > > > > > easily > >> >> >> > > >>>> >> > > >> >> > > > > > > client side just the way you > would > >> >> >> > > >>>> broker-side? I'd > >> >> >> > > >>>> >> > > imagine > >> >> >> > > >>>> >> > > >> a > >> >> >> > > >>>> >> > > >> >> > > couple > >> >> >> > > >>>> >> > > >> >> > > > of > >> >> >> > > >>>> >> > > >> >> > > > > > > random seeks + reads during very > >> rare > >> >> >> > > >>>> occasions (i.e. > >> >> >> > > >>>> >> > > when > >> >> >> > > >>>> >> > > >> the > >> >> >> > > >>>> >> > > >> >> > app > >> >> >> > > >>>> >> > > >> >> > > > > starts > >> >> >> > > >>>> >> > > >> >> > > > > > > up) wouldn't be a problem > >> >> >> performance-wise. > >> >> >> > > Or > >> >> >> > > >>>> is it > >> >> >> > > >>>> >> > also > >> >> >> > > >>>> >> > > >> that > >> >> >> > > >>>> >> > > >> >> > you > >> >> >> > > >>>> >> > > >> >> > > > need > >> >> >> > > >>>> >> > > >> >> > > > > > the > >> >> >> > > >>>> >> > > >> >> > > > > > > broker to enforce things like > >> >> >> monotonically > >> >> >> > > >>>> >> increasing > >> >> >> > > >>>> >> > > >> >> timestamps > >> >> >> > > >>>> >> > > >> >> > > > since > >> >> >> > > >>>> >> > > >> >> > > > > > you > >> >> >> > > >>>> >> > > >> >> > > > > > > can't do the query properly and > >> >> >> efficiently > >> >> >> > > >>>> without > >> >> >> > > >>>> >> > that > >> >> >> > > >>>> >> > > >> >> > guarantee, > >> >> >> > > >>>> >> > > >> >> > > > and > >> >> >> > > >>>> >> > > >> >> > > > > > > therefore what applications are > >> >> actually > >> >> >> > > >>>> looking for > >> >> >> > > >>>> >> > *is* > >> >> >> > > >>>> >> > > >> >> > > broker-side > >> >> >> > > >>>> >> > > >> >> > > > > > > timestamps? > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > -Ewen > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > Consider cases where data is > >> being > >> >> >> copied > >> >> >> > > >>>> from a > >> >> >> > > >>>> >> > > database > >> >> >> > > >>>> >> > > >> or > >> >> >> > > >>>> >> > > >> >> > from > >> >> >> > > >>>> >> > > >> >> > > > log > >> >> >> > > >>>> >> > > >> >> > > > > > > > files. In steady-state the > server > >> >> time > >> >> >> is > >> >> >> > > >>>> very > >> >> >> > > >>>> >> close > >> >> >> > > >>>> >> > to > >> >> >> > > >>>> >> > > >> the > >> >> >> > > >>>> >> > > >> >> > > client > >> >> >> > > >>>> >> > > >> >> > > > > time > >> >> >> > > >>>> >> > > >> >> > > > > > > if > >> >> >> > > >>>> >> > > >> >> > > > > > > > their clocks are sync'd (see > 1) > >> but > >> >> >> there > >> >> >> > > >>>> will be > >> >> >> > > >>>> >> > > times of > >> >> >> > > >>>> >> > > >> >> > large > >> >> >> > > >>>> >> > > >> >> > > > > > > divergence > >> >> >> > > >>>> >> > > >> >> > > > > > > > when the copying process is > >> stopped > >> >> or > >> >> >> > > falls > >> >> >> > > >>>> >> behind. > >> >> >> > > >>>> >> > > When > >> >> >> > > >>>> >> > > >> >> this > >> >> >> > > >>>> >> > > >> >> > > > occurs > >> >> >> > > >>>> >> > > >> >> > > > > > it > >> >> >> > > >>>> >> > > >> >> > > > > > > is > >> >> >> > > >>>> >> > > >> >> > > > > > > > clear that the time the data > >> >> arrived on > >> >> >> > the > >> >> >> > > >>>> server > >> >> >> > > >>>> >> is > >> >> >> > > >>>> >> > > >> >> > irrelevant, > >> >> >> > > >>>> >> > > >> >> > > > it > >> >> >> > > >>>> >> > > >> >> > > > > is > >> >> >> > > >>>> >> > > >> >> > > > > > > the > >> >> >> > > >>>> >> > > >> >> > > > > > > > source timestamp that matters. > >> This > >> >> is > >> >> >> > the > >> >> >> > > >>>> problem > >> >> >> > > >>>> >> > you > >> >> >> > > >>>> >> > > are > >> >> >> > > >>>> >> > > >> >> > trying > >> >> >> > > >>>> >> > > >> >> > > > to > >> >> >> > > >>>> >> > > >> >> > > > > > fix > >> >> >> > > >>>> >> > > >> >> > > > > > > by > >> >> >> > > >>>> >> > > >> >> > > > > > > > retaining the mm timestamp but > >> >> really > >> >> >> the > >> >> >> > > >>>> client > >> >> >> > > >>>> >> > should > >> >> >> > > >>>> >> > > >> >> always > >> >> >> > > >>>> >> > > >> >> > > set > >> >> >> > > >>>> >> > > >> >> > > > > the > >> >> >> > > >>>> >> > > >> >> > > > > > > time > >> >> >> > > >>>> >> > > >> >> > > > > > > > with the use of server-side > time > >> as > >> >> a > >> >> >> > > >>>> fallback. It > >> >> >> > > >>>> >> > > would > >> >> >> > > >>>> >> > > >> be > >> >> >> > > >>>> >> > > >> >> > worth > >> >> >> > > >>>> >> > > >> >> > > > > > talking > >> >> >> > > >>>> >> > > >> >> > > > > > > > to the Samza folks and reading > >> >> through > >> >> >> > this > >> >> >> > > >>>> blog > >> >> >> > > >>>> >> > post ( > >> >> >> > > >>>> >> > > >> >> > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > >> >> >> > > >>>> >> > > >> >> > >> >> >> > > >>>> >> > > >> > >> >> >> > > >>>> >> > > > >> >> >> > > >>>> >> > > >> >> >> > > >>>> >> > >> >> >> > > >>>> > >> >> >> > > > >> >> >> > > >> >> >> > >> >> > >> > http://radar.oreilly.com/2015/08/the-world-beyond-batch-streaming-101.html > >> >> >> > > >>>> >> > > >> >> > > > > > > > ) > >> >> >> > > >>>> >> > > >> >> > > > > > > > on this subject since we went > >> >> through > >> >> >> > > similar > >> >> >> > > >>>> >> > > learnings on > >> >> >> > > >>>> >> > > >> >> the > >> >> >> > > >>>> >> > > >> >> > > > stream > >> >> >> > > >>>> >> > > >> >> > > > > > > > processing side. > >> >> >> > > >>>> >> > > >> >> > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > I think the implication of > these > >> >> two is > >> >> >> > > that > >> >> >> > > >>>> we > >> >> >> > > >>>> >> need > >> >> >> > > >>>> >> > a > >> >> >> > > >>>> >> > > >> >> proposal > >> >> >> > > >>>> >> > > >> >> > > > that > >> >> >> > > >>>> >> > > >> >> > > > > > > > handles potentially very > >> >> out-of-order > >> >> >> > > >>>> timestamps in > >> >> >> > > >>>> >> > > some > >> >> >> > > >>>> >> > > >> kind > >> >> >> > > >>>> >> > > >> >> > of > >> >> >> > > >>>> >> > > >> >> > > > > sanish > >> >> >> > > >>>> >> > > >> >> > > > > > > way > >> >> >> > > >>>> >> > > >> >> > > > > > > > (buggy clients will set > something > >> >> >> totally > >> >> >> > > >>>> wrong as > >> >> >> > > >>>> >> > the > >> >> >> > > >>>> >> > > >> time). > >> >> >> > > >>>> >> > > >> >> > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > -Jay > >> >> >> > > >>>> >> > > >> >> > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > On Sun, Sep 6, 2015 at 4:22 > PM, > >> Jay > >> >> >> > Kreps < > >> >> >> > > >>>> >> > > >> j...@confluent.io> > >> >> >> > > >>>> >> > > >> >> > > > wrote: > >> >> >> > > >>>> >> > > >> >> > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > > The magic byte is used to > >> version > >> >> >> > message > >> >> >> > > >>>> format > >> >> >> > > >>>> >> so > >> >> >> > > >>>> >> > > >> we'll > >> >> >> > > >>>> >> > > >> >> > need > >> >> >> > > >>>> >> > > >> >> > > to > >> >> >> > > >>>> >> > > >> >> > > > > > make > >> >> >> > > >>>> >> > > >> >> > > > > > > > > sure that check is in > place--I > >> >> >> actually > >> >> >> > > >>>> don't see > >> >> >> > > >>>> >> > it > >> >> >> > > >>>> >> > > in > >> >> >> > > >>>> >> > > >> the > >> >> >> > > >>>> >> > > >> >> > > > current > >> >> >> > > >>>> >> > > >> >> > > > > > > > > consumer code which I think > is > >> a > >> >> bug > >> >> >> we > >> >> >> > > >>>> should > >> >> >> > > >>>> >> fix > >> >> >> > > >>>> >> > > for > >> >> >> > > >>>> >> > > >> the > >> >> >> > > >>>> >> > > >> >> > next > >> >> >> > > >>>> >> > > >> >> > > > > > release > >> >> >> > > >>>> >> > > >> >> > > > > > > > > (filed KAFKA-2523). The > >> purpose of > >> >> >> that > >> >> >> > > >>>> field is > >> >> >> > > >>>> >> so > >> >> >> > > >>>> >> > > >> there > >> >> >> > > >>>> >> > > >> >> is > >> >> >> > > >>>> >> > > >> >> > a > >> >> >> > > >>>> >> > > >> >> > > > > clear > >> >> >> > > >>>> >> > > >> >> > > > > > > > check > >> >> >> > > >>>> >> > > >> >> > > > > > > > > on the format rather than > the > >> >> >> scrambled > >> >> >> > > >>>> scenarios > >> >> >> > > >>>> >> > > Becket > >> >> >> > > >>>> >> > > >> >> > > > describes. > >> >> >> > > >>>> >> > > >> >> > > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > > Also, Becket, I don't think > >> just > >> >> >> fixing > >> >> >> > > >>>> the java > >> >> >> > > >>>> >> > > client > >> >> >> > > >>>> >> > > >> is > >> >> >> > > >>>> >> > > >> >> > > > > sufficient > >> >> >> > > >>>> >> > > >> >> > > > > > > as > >> >> >> > > >>>> >> > > >> >> > > > > > > > > that would break other > >> >> clients--i.e. > >> >> >> if > >> >> >> > > >>>> anyone > >> >> >> > > >>>> >> > > writes a > >> >> >> > > >>>> >> > > >> v1 > >> >> >> > > >>>> >> > > >> >> > > > > messages, > >> >> >> > > >>>> >> > > >> >> > > > > > > even > >> >> >> > > >>>> >> > > >> >> > > > > > > > > by accident, any > non-v1-capable > >> >> >> > consumer > >> >> >> > > >>>> will > >> >> >> > > >>>> >> > break. > >> >> >> > > >>>> >> > > I > >> >> >> > > >>>> >> > > >> >> think > >> >> >> > > >>>> >> > > >> >> > we > >> >> >> > > >>>> >> > > >> >> > > > > > > probably > >> >> >> > > >>>> >> > > >> >> > > > > > > > > need a way to have the > server > >> >> ensure > >> >> >> a > >> >> >> > > >>>> particular > >> >> >> > > >>>> >> > > >> message > >> >> >> > > >>>> >> > > >> >> > > format > >> >> >> > > >>>> >> > > >> >> > > > > > either > >> >> >> > > >>>> >> > > >> >> > > > > > > > at > >> >> >> > > >>>> >> > > >> >> > > > > > > > > read or write time. > >> >> >> > > >>>> >> > > >> >> > > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > > -Jay > >> >> >> > > >>>> >> > > >> >> > > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > > On Thu, Sep 3, 2015 at 3:47 > PM, > >> >> >> > Jiangjie > >> >> >> > > >>>> Qin > >> >> >> > > >>>> >> > > >> >> > > > > > <j...@linkedin.com.invalid > >> >> >> > > >>>> >> > > >> >> > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > > wrote: > >> >> >> > > >>>> >> > > >> >> > > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> Hi Guozhang, > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> I checked the code again. > >> >> Actually > >> >> >> CRC > >> >> >> > > >>>> check > >> >> >> > > >>>> >> > > probably > >> >> >> > > >>>> >> > > >> >> won't > >> >> >> > > >>>> >> > > >> >> > > > fail. > >> >> >> > > >>>> >> > > >> >> > > > > > The > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> newly > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> added timestamp field > might be > >> >> >> treated > >> >> >> > > as > >> >> >> > > >>>> >> > keyLength > >> >> >> > > >>>> >> > > >> >> instead, > >> >> >> > > >>>> >> > > >> >> > > so > >> >> >> > > >>>> >> > > >> >> > > > we > >> >> >> > > >>>> >> > > >> >> > > > > > are > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> likely to receive an > >> >> >> > > >>>> IllegalArgumentException > >> >> >> > > >>>> >> when > >> >> >> > > >>>> >> > > try > >> >> >> > > >>>> >> > > >> to > >> >> >> > > >>>> >> > > >> >> > read > >> >> >> > > >>>> >> > > >> >> > > > the > >> >> >> > > >>>> >> > > >> >> > > > > > > key. > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> I'll update the KIP. > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> Thanks, > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> Jiangjie (Becket) Qin > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> On Thu, Sep 3, 2015 at > 12:48 > >> PM, > >> >> >> > > Jiangjie > >> >> >> > > >>>> Qin < > >> >> >> > > >>>> >> > > >> >> > > > j...@linkedin.com> > >> >> >> > > >>>> >> > > >> >> > > > > > > > wrote: > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > Hi, Guozhang, > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > Thanks for reading the > KIP. > >> By > >> >> >> "old > >> >> >> > > >>>> >> consumer", I > >> >> >> > > >>>> >> > > >> meant > >> >> >> > > >>>> >> > > >> >> the > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > > ZookeeperConsumerConnector > >> in > >> >> >> trunk > >> >> >> > > >>>> now, i.e. > >> >> >> > > >>>> >> > > without > >> >> >> > > >>>> >> > > >> >> this > >> >> >> > > >>>> >> > > >> >> > > bug > >> >> >> > > >>>> >> > > >> >> > > > > > > fixed. > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> If we > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > fix the > >> >> ZookeeperConsumerConnector > >> >> >> > > then > >> >> >> > > >>>> it > >> >> >> > > >>>> >> will > >> >> >> > > >>>> >> > > throw > >> >> >> > > >>>> >> > > >> >> > > > exception > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> complaining > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > about the unsupported > >> version > >> >> when > >> >> >> > it > >> >> >> > > >>>> sees > >> >> >> > > >>>> >> > message > >> >> >> > > >>>> >> > > >> >> format > >> >> >> > > >>>> >> > > >> >> > > V1. > >> >> >> > > >>>> >> > > >> >> > > > > > What I > >> >> >> > > >>>> >> > > >> >> > > > > > > > was > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > trying to say is that if > we > >> >> have > >> >> >> > some > >> >> >> > > >>>> >> > > >> >> > > > ZookeeperConsumerConnector > >> >> >> > > >>>> >> > > >> >> > > > > > > > running > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > without the fix, the > >> consumer > >> >> will > >> >> >> > > >>>> complain > >> >> >> > > >>>> >> > about > >> >> >> > > >>>> >> > > CRC > >> >> >> > > >>>> >> > > >> >> > > mismatch > >> >> >> > > >>>> >> > > >> >> > > > > > > instead > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> of > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > unsupported version. > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > Thanks, > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > Jiangjie (Becket) Qin > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > On Thu, Sep 3, 2015 at > 12:15 > >> >> PM, > >> >> >> > > >>>> Guozhang > >> >> >> > > >>>> >> Wang < > >> >> >> > > >>>> >> > > >> >> > > > > > wangg...@gmail.com> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> wrote: > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> Thanks for the write-up > >> >> Jiangjie. > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> One comment about > migration > >> >> plan: > >> >> >> > > "For > >> >> >> > > >>>> old > >> >> >> > > >>>> >> > > >> consumers, > >> >> >> > > >>>> >> > > >> >> if > >> >> >> > > >>>> >> > > >> >> > > they > >> >> >> > > >>>> >> > > >> >> > > > > see > >> >> >> > > >>>> >> > > >> >> > > > > > > the > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> new > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> protocol the CRC check > will > >> >> >> fail".. > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> Do you mean this bug in > the > >> >> old > >> >> >> > > >>>> consumer > >> >> >> > > >>>> >> cannot > >> >> >> > > >>>> >> > > be > >> >> >> > > >>>> >> > > >> >> fixed > >> >> >> > > >>>> >> > > >> >> > > in a > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> backward-compatible way? > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> Guozhang > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> On Thu, Sep 3, 2015 at > 8:35 > >> >> AM, > >> >> >> > > >>>> Jiangjie Qin > >> >> >> > > >>>> >> > > >> >> > > > > > > > <j...@linkedin.com.invalid > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> wrote: > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > Hi, > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > We just created > KIP-31 to > >> >> >> > propose a > >> >> >> > > >>>> message > >> >> >> > > >>>> >> > > format > >> >> >> > > >>>> >> > > >> >> > change > >> >> >> > > >>>> >> > > >> >> > > > in > >> >> >> > > >>>> >> > > >> >> > > > > > > Kafka. > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > >> >> >> > > >>>> >> > > >> >> > >> >> >> > > >>>> >> > > >> > >> >> >> > > >>>> >> > > > >> >> >> > > >>>> >> > > >> >> >> > > >>>> >> > >> >> >> > > >>>> > >> >> >> > > > >> >> >> > > >> >> >> > >> >> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-31+-+Message+format+change+proposal > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > As a summary, the > >> >> motivations > >> >> >> > are: > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > 1. Avoid server side > >> message > >> >> >> > > >>>> re-compression > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > 2. Honor time-based > log > >> roll > >> >> >> and > >> >> >> > > >>>> retention > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > 3. Enable offset > search > >> by > >> >> >> > > timestamp > >> >> >> > > >>>> at a > >> >> >> > > >>>> >> > finer > >> >> >> > > >>>> >> > > >> >> > > > granularity. > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > Feedback and comments > are > >> >> >> > welcome! > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > Thanks, > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > Jiangjie (Becket) Qin > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> -- > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> -- Guozhang > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > >> >> >> > > >>>> >> > > >> >> > > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > -- > >> >> >> > > >>>> >> > > >> >> > > > > > > Thanks, > >> >> >> > > >>>> >> > > >> >> > > > > > > Ewen > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > >> >> >> > > >>>> >> > > >> >> > > > > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > -- > >> >> >> > > >>>> >> > > >> >> > > Thanks, > >> >> >> > > >>>> >> > > >> >> > > Neha > >> >> >> > > >>>> >> > > >> >> > > > >> >> >> > > >>>> >> > > >> >> > > >> >> >> > > >>>> >> > > >> >> > >> >> >> > > >>>> >> > > >> > >> >> >> > > >>>> >> > > > >> >> >> > > >>>> >> > > >> >> >> > > >>>> >> > >> >> >> > > >>>> >> > >> >> >> > > >>>> >> > >> >> >> > > >>>> >> -- > >> >> >> > > >>>> >> Thanks, > >> >> >> > > >>>> >> Ewen > >> >> >> > > >>>> >> > >> >> >> > > >>>> > >> >> >> > > >>>> > >> >> >> > > >>> > >> >> >> > > >> > >> >> >> > > > > >> >> >> > > > >> >> >> > > >> >> >> > >> >> > >> > >