The upgrade plan works, but the potentially long interim phase of skipping zero-copy for down-conversion could be problematic especially for large deployments with large consumer fan-out. It is not only going to be memory overhead but CPU as well - since you need to decompress, write absolute offsets, then recompress for every v1 fetch. i.e., it may be safer (but obviously more tedious) to have a multi-step upgrade process. For e.g.,:
1 - Upgrade brokers, but disable the feature. i.e., either reject producer requests v2 or down-convert to old message format (with absolute offsets) 2 - Upgrade clients, but they should only use v1 requests 3 - Switch (all or most) consumers to use v2 fetch format (which will use zero-copy). 4 - Turn on the feature on the brokers to allow producer requests v2 5 - Switch producers to use v2 produce format (You may want a v1 fetch rate metric and decide to proceed to step 4 only when that comes down to a trickle) I'm not sure if the prolonged upgrade process is viable in every scenario. I think it should work at LinkedIn for e.g., but may not for other environments. Joel On Tue, Sep 22, 2015 at 12:55 AM, Jiangjie Qin <j...@linkedin.com.invalid> wrote: > Thanks for the explanation, Jay. > Agreed. We have to keep the offset to be the offset of last inner message. > > Jiangjie (Becket) Qin > > On Mon, Sep 21, 2015 at 6:21 PM, Jay Kreps <j...@confluent.io> wrote: > >> For (3) I don't think we can change the offset in the outer message from >> what it is today as it is relied upon in the search done in the log layer. >> The reason it is the offset of the last message rather than the first is to >> make the offset a least upper bound (i.e. the smallest offset >= >> fetch_offset). This needs to work the same for both gaps due to compacted >> topics and gaps due to compressed messages. >> >> So imagine you had a compressed set with offsets {45, 46, 47, 48} if you >> assigned this compressed set the offset 45 a fetch for 46 would actually >> skip ahead to 49 (the least upper bound). >> >> -Jay >> >> On Mon, Sep 21, 2015 at 5:17 PM, Jun Rao <j...@confluent.io> wrote: >> >> > Jiangjie, >> > >> > Thanks for the writeup. A few comments below. >> > >> > 1. We will need to be a bit careful with fetch requests from the >> followers. >> > Basically, as we are doing a rolling upgrade of the brokers, the follower >> > can't start issuing V2 of the fetch request until the rest of the brokers >> > are ready to process it. So, we probably need to make use of >> > inter.broker.protocol.version to do the rolling upgrade. In step 1, we >> set >> > inter.broker.protocol.version to 0.9 and do a round of rolling upgrade of >> > the brokers. At this point, all brokers are capable of processing V2 of >> > fetch requests, but no broker is using it yet. In step 2, we >> > set inter.broker.protocol.version to 0.10 and do another round of rolling >> > restart of the brokers. In this step, the upgraded brokers will start >> > issuing V2 of the fetch request. >> > >> > 2. If we do #1, I am not sure if there is still a need for >> > message.format.version since the broker can start writing messages in the >> > new format after inter.broker.protocol.version is set to 0.10. >> > >> > 3. It wasn't clear from the wiki whether the base offset in the shallow >> > message is the offset of the first or the last inner message. It's better >> > to use the offset of the last inner message. This way, the followers >> don't >> > have to decompress messages to figure out the next fetch offset. >> > >> > 4. I am not sure that I understand the following sentence in the wiki. It >> > seems that the relative offsets in a compressed message don't have to be >> > consecutive. If so, why do we need to update the relative offsets in the >> > inner messages? >> > "When the log cleaner compacts log segments, it needs to update the inner >> > message's relative offset values." >> > >> > Thanks, >> > >> > Jun >> > >> > On Thu, Sep 17, 2015 at 12:54 PM, Jiangjie Qin <j...@linkedin.com.invalid >> > >> > wrote: >> > >> > > Hi folks, >> > > >> > > Thanks a lot for the feedback on KIP-31 - move to use relative offset. >> > (Not >> > > including timestamp and index discussion). >> > > >> > > I updated the migration plan section as we discussed on KIP hangout. I >> > > think it is the only concern raised so far. Please let me know if there >> > are >> > > further comments about the KIP. >> > > >> > > Thanks, >> > > >> > > Jiangjie (Becket) Qin >> > > >> > > On Mon, Sep 14, 2015 at 5:13 PM, Jiangjie Qin <j...@linkedin.com> >> wrote: >> > > >> > > > I just updated the KIP-33 to explain the indexing on CreateTime and >> > > > LogAppendTime respectively. I also used some use case to compare the >> > two >> > > > solutions. >> > > > Although this is for KIP-33, but it does give a some insights on >> > whether >> > > > it makes sense to have a per message LogAppendTime. >> > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index >> > > > >> > > > As a short summary of the conclusions we have already reached on >> > > timestamp: >> > > > 1. It is good to add a timestamp to the message. >> > > > 2. LogAppendTime should be used for broker policy enforcement (Log >> > > > retention / rolling) >> > > > 3. It is useful to have a CreateTime in message format, which is >> > > immutable >> > > > after producer sends the message. >> > > > >> > > > There are following questions still in discussion: >> > > > 1. Should we also add LogAppendTime to message format? >> > > > 2. which timestamp should we use to build the index. >> > > > >> > > > Let's talk about question 1 first because question 2 is actually a >> > follow >> > > > up question for question 1. >> > > > Here are what I think: >> > > > 1a. To enforce broker log policy, theoretically we don't need >> > per-message >> > > > LogAppendTime. If we don't include LogAppendTime in message, we still >> > > need >> > > > to implement a separate solution to pass log segment timestamps among >> > > > brokers. That means if we don't include the LogAppendTime in message, >> > > there >> > > > will be further complication in replication. >> > > > 1b. LogAppendTime has some advantage over CreateTime (KIP-33 has >> detail >> > > > comparison) >> > > > 1c. We have already exposed offset, which is essentially an internal >> > > > concept of message in terms of position. Exposing LogAppendTime means >> > we >> > > > expose another internal concept of message in terms of time. >> > > > >> > > > Considering the above reasons, personally I think it worth adding the >> > > > LogAppendTime to each message. >> > > > >> > > > Any thoughts? >> > > > >> > > > Thanks, >> > > > >> > > > Jiangjie (Becket) Qin >> > > > >> > > > On Mon, Sep 14, 2015 at 11:44 AM, Jiangjie Qin <j...@linkedin.com> >> > > wrote: >> > > > >> > > >> I was trying to send last email before KIP hangout so maybe did not >> > > think >> > > >> it through completely. By the way, the discussion is actually more >> > > related >> > > >> to KIP-33, i.e. whether we should index on CreateTime or >> > LogAppendTime. >> > > >> (Although it seems all the discussion are still in this mailing >> > > thread...) >> > > >> This solution in last email is for indexing on CreateTime. It is >> > > >> essentially what Jay suggested except we use a timestamp map instead >> > of >> > > a >> > > >> memory mapped index file. Please ignore the proposal of using a log >> > > >> compacted topic. The solution can be simplified to: >> > > >> >> > > >> Each broker keeps >> > > >> 1. a timestamp index map - Map[TopicPartitionSegment, Map[Timestamp, >> > > >> Offset]]. The timestamp is on minute boundary. >> > > >> 2. A timestamp index file for each segment. >> > > >> When a broker receives a message (both leader or follower), it >> checks >> > if >> > > >> the timestamp index map contains the timestamp for current segment. >> > The >> > > >> broker add the offset to the map and append an entry to the >> timestamp >> > > index >> > > >> if the timestamp does not exist. i.e. we only use the index file as >> a >> > > >> persistent copy of the index timestamp map. >> > > >> >> > > >> When a log segment is deleted, we need to: >> > > >> 1. delete the TopicPartitionKeySegment key in the timestamp index >> map. >> > > >> 2. delete the timestamp index file >> > > >> >> > > >> This solution assumes we only keep CreateTime in the message. There >> > are >> > > a >> > > >> few trade-offs in this solution: >> > > >> 1. The granularity of search will be per minute. >> > > >> 2. All the timestamp index map has to be in the memory all the time. >> > > >> 3. We need to think about another way to honor log retention time >> and >> > > >> time-based log rolling. >> > > >> 4. We lose the benefit brought by including LogAppendTime in the >> > message >> > > >> mentioned earlier. >> > > >> >> > > >> I am not sure whether this solution is necessarily better than >> > indexing >> > > >> on LogAppendTime. >> > > >> >> > > >> I will update KIP-33 to explain the solution to index on CreateTime >> > and >> > > >> LogAppendTime respectively and put some more concrete use cases as >> > well. >> > > >> >> > > >> Thanks, >> > > >> >> > > >> Jiangjie (Becket) Qin >> > > >> >> > > >> >> > > >> On Mon, Sep 14, 2015 at 9:40 AM, Jiangjie Qin <j...@linkedin.com> >> > > wrote: >> > > >> >> > > >>> Hi Joel, >> > > >>> >> > > >>> Good point about rebuilding index. I agree that having a per >> message >> > > >>> LogAppendTime might be necessary. About time adjustment, the >> solution >> > > >>> sounds promising, but it might be better to make it as a follow up >> of >> > > the >> > > >>> KIP because it seems a really rare use case. >> > > >>> >> > > >>> I have another thought on how to manage the out of order >> timestamps. >> > > >>> Maybe we can do the following: >> > > >>> Create a special log compacted topic __timestamp_index similar to >> > > topic, >> > > >>> the key would be (TopicPartition, TimeStamp_Rounded_To_Minute), the >> > > value >> > > >>> is offset. In memory, we keep a map for each TopicPartition, the >> > value >> > > is >> > > >>> (timestamp_rounded_to_minute -> smallest_offset_in_the_minute). >> This >> > > way we >> > > >>> can search out of order message and make sure no message is >> missing. >> > > >>> >> > > >>> Thoughts? >> > > >>> >> > > >>> Thanks, >> > > >>> >> > > >>> Jiangjie (Becket) Qin >> > > >>> >> > > >>> On Fri, Sep 11, 2015 at 12:46 PM, Joel Koshy <jjkosh...@gmail.com> >> > > >>> wrote: >> > > >>> >> > > >>>> Jay had mentioned the scenario of mirror-maker bootstrap which >> would >> > > >>>> effectively reset the logAppendTimestamps for the bootstrapped >> data. >> > > >>>> If we don't include logAppendTimestamps in each message there is a >> > > >>>> similar scenario when rebuilding indexes during recovery. So it >> > seems >> > > >>>> it may be worth adding that timestamp to messages. The drawback to >> > > >>>> that is exposing a server-side concept in the protocol (although >> we >> > > >>>> already do that with offsets). logAppendTimestamp really should be >> > > >>>> decided by the broker so I think the first scenario may have to be >> > > >>>> written off as a gotcha, but the second may be worth addressing >> (by >> > > >>>> adding it to the message format). >> > > >>>> >> > > >>>> The other point that Jay raised which needs to be addressed (since >> > we >> > > >>>> require monotically increasing timestamps in the index) in the >> > > >>>> proposal is changing time on the server (I'm a little less >> concerned >> > > >>>> about NTP clock skews than a user explicitly changing the server's >> > > >>>> time - i.e., big clock skews). We would at least want to "set >> back" >> > > >>>> all the existing timestamps to guarantee non-decreasing timestamps >> > > >>>> with future messages. I'm not sure at this point how best to >> handle >> > > >>>> that, but we could perhaps have a epoch/base-time (or >> > time-correction) >> > > >>>> stored in the log directories and base all log index timestamps >> off >> > > >>>> that base-time (or corrected). So if at any time you determine >> that >> > > >>>> time has changed backwards you can adjust that base-time without >> > > >>>> having to fix up all the entries. Without knowing the exact diff >> > > >>>> between the previous clock and new clock we cannot adjust the >> times >> > > >>>> exactly, but we can at least ensure increasing timestamps. >> > > >>>> >> > > >>>> On Fri, Sep 11, 2015 at 10:52 AM, Jiangjie Qin >> > > >>>> <j...@linkedin.com.invalid> wrote: >> > > >>>> > Ewen and Jay, >> > > >>>> > >> > > >>>> > They way I see the LogAppendTime is another format of "offset". >> It >> > > >>>> serves >> > > >>>> > the following purpose: >> > > >>>> > 1. Locate messages not only by position, but also by time. The >> > > >>>> difference >> > > >>>> > from offset is timestamp is not unique for all messags. >> > > >>>> > 2. Allow broker to manage messages based on time, e.g. >> retention, >> > > >>>> rolling >> > > >>>> > 3. Provide convenience for user to search message not only by >> > > offset, >> > > >>>> but >> > > >>>> > also by timestamp. >> > > >>>> > >> > > >>>> > For purpose (2) we don't need per message server timestamp. We >> > only >> > > >>>> need >> > > >>>> > per log segment server timestamp and propagate it among brokers. >> > > >>>> > >> > > >>>> > For (1) and (3), we need per message timestamp. Then the >> question >> > is >> > > >>>> > whether we should use CreateTime or LogAppendTime? >> > > >>>> > >> > > >>>> > I completely agree that an application timestamp is very useful >> > for >> > > >>>> many >> > > >>>> > use cases. But it seems to me that having Kafka to understand >> and >> > > >>>> maintain >> > > >>>> > application timestamp is a bit over demanding. So I think there >> is >> > > >>>> value to >> > > >>>> > pass on CreateTime for application convenience, but I am not >> sure >> > it >> > > >>>> can >> > > >>>> > replace LogAppendTime. Managing out-of-order CreateTime is >> > > equivalent >> > > >>>> to >> > > >>>> > allowing producer to send their own offset and ask broker to >> > manage >> > > >>>> the >> > > >>>> > offset for them, It is going to be very hard to maintain and >> could >> > > >>>> create >> > > >>>> > huge performance/functional issue because of complicated logic. >> > > >>>> > >> > > >>>> > About whether we should expose LogAppendTime to broker, I agree >> > that >> > > >>>> server >> > > >>>> > timestamp is internal to broker, but isn't offset also an >> internal >> > > >>>> concept? >> > > >>>> > Arguably it's not provided by producer so consumer application >> > logic >> > > >>>> does >> > > >>>> > not have to know offset. But user needs to know offset because >> > they >> > > >>>> need to >> > > >>>> > know "where is the message" in the log. LogAppendTime provides >> the >> > > >>>> answer >> > > >>>> > of "When was the message appended" to the log. So personally I >> > think >> > > >>>> it is >> > > >>>> > reasonable to expose the LogAppendTime to consumers. >> > > >>>> > >> > > >>>> > I can see some use cases of exposing the LogAppendTime, to name >> > > some: >> > > >>>> > 1. Let's say broker has 7 days of log retention, some >> application >> > > >>>> wants to >> > > >>>> > reprocess the data in past 3 days. User can simply provide the >> > > >>>> timestamp >> > > >>>> > and start consume. >> > > >>>> > 2. User can easily know lag by time. >> > > >>>> > 3. Cross cluster fail over. This is a more complicated use case, >> > > >>>> there are >> > > >>>> > two goals: 1) Not lose message; and 2) do not reconsume tons of >> > > >>>> messages. >> > > >>>> > Only knowing offset of cluster A won't help with finding fail >> over >> > > >>>> point in >> > > >>>> > cluster B because an offset of a cluster means nothing to >> another >> > > >>>> cluster. >> > > >>>> > Timestamp however is a good cross cluster reference in this >> case. >> > > >>>> > >> > > >>>> > Thanks, >> > > >>>> > >> > > >>>> > Jiangjie (Becket) Qin >> > > >>>> > >> > > >>>> > On Thu, Sep 10, 2015 at 9:28 PM, Ewen Cheslack-Postava < >> > > >>>> e...@confluent.io> >> > > >>>> > wrote: >> > > >>>> > >> > > >>>> >> Re: MM preserving timestamps: Yes, this was how I interpreted >> the >> > > >>>> point in >> > > >>>> >> the KIP and I only raised the issue because it restricts the >> > > >>>> usefulness of >> > > >>>> >> timestamps anytime MM is involved. I agree it's not a deal >> > breaker, >> > > >>>> but I >> > > >>>> >> wanted to understand exact impact of the change. Some users >> seem >> > to >> > > >>>> want to >> > > >>>> >> be able to seek by application-defined timestamps (despite the >> > many >> > > >>>> obvious >> > > >>>> >> issues involved), and the proposal clearly would not support >> that >> > > >>>> unless >> > > >>>> >> the timestamps submitted with the produce requests were >> > respected. >> > > >>>> If we >> > > >>>> >> ignore client submitted timestamps, then we probably want to >> try >> > to >> > > >>>> hide >> > > >>>> >> the timestamps as much as possible in any public interface >> (e.g. >> > > >>>> never >> > > >>>> >> shows up in any public consumer APIs), but expose it just >> enough >> > to >> > > >>>> be >> > > >>>> >> useful for operational purposes. >> > > >>>> >> >> > > >>>> >> Sorry if my devil's advocate position / attempt to map the >> design >> > > >>>> space led >> > > >>>> >> to some confusion! >> > > >>>> >> >> > > >>>> >> -Ewen >> > > >>>> >> >> > > >>>> >> >> > > >>>> >> On Thu, Sep 10, 2015 at 5:48 PM, Jay Kreps <j...@confluent.io> >> > > wrote: >> > > >>>> >> >> > > >>>> >> > Ah, I see, I think I misunderstood about MM, it was called >> out >> > in >> > > >>>> the >> > > >>>> >> > proposal and I thought you were saying you'd retain the >> > timestamp >> > > >>>> but I >> > > >>>> >> > think you're calling out that you're not. In that case you do >> > > have >> > > >>>> the >> > > >>>> >> > opposite problem, right? When you add mirroring for a topic >> all >> > > >>>> that data >> > > >>>> >> > will have a timestamp of now and retention won't be right. >> Not >> > a >> > > >>>> blocker >> > > >>>> >> > but a bit of a gotcha. >> > > >>>> >> > >> > > >>>> >> > -Jay >> > > >>>> >> > >> > > >>>> >> > >> > > >>>> >> > >> > > >>>> >> > On Thu, Sep 10, 2015 at 5:40 PM, Joel Koshy < >> > jjkosh...@gmail.com >> > > > >> > > >>>> wrote: >> > > >>>> >> > >> > > >>>> >> > > > Don't you see all the same issues you see with >> > client-defined >> > > >>>> >> > timestamp's >> > > >>>> >> > > > if you let mm control the timestamp as you were >> proposing? >> > > >>>> That means >> > > >>>> >> > > time >> > > >>>> >> > > >> > > >>>> >> > > Actually I don't think that was in the proposal (or was >> it?). >> > > >>>> i.e., I >> > > >>>> >> > > think it was always supposed to be controlled by the broker >> > > (and >> > > >>>> not >> > > >>>> >> > > MM). >> > > >>>> >> > > >> > > >>>> >> > > > Also, Joel, can you just confirm that you guys have >> talked >> > > >>>> through >> > > >>>> >> the >> > > >>>> >> > > > whole timestamp thing with the Samza folks at LI? The >> > reason >> > > I >> > > >>>> ask >> > > >>>> >> > about >> > > >>>> >> > > > this is that Samza and Kafka Streams (KIP-28) are both >> > trying >> > > >>>> to rely >> > > >>>> >> > on >> > > >>>> >> > > >> > > >>>> >> > > We have not. This is a good point - we will follow-up. >> > > >>>> >> > > >> > > >>>> >> > > > WRT your idea of a FollowerFetchRequestI had thought of a >> > > >>>> similar >> > > >>>> >> idea >> > > >>>> >> > > > where we use the leader's timestamps to approximately set >> > the >> > > >>>> >> > follower's >> > > >>>> >> > > > timestamps. I had thought of just adding a partition >> > metadata >> > > >>>> request >> > > >>>> >> > > that >> > > >>>> >> > > > would subsume the current offset/time lookup and could be >> > > used >> > > >>>> by the >> > > >>>> >> > > > follower to try to approximately keep their timestamps >> > > kosher. >> > > >>>> It's a >> > > >>>> >> > > > little hacky and doesn't help with MM but it is also >> maybe >> > > less >> > > >>>> >> > invasive >> > > >>>> >> > > so >> > > >>>> >> > > > that approach could be viable. >> > > >>>> >> > > >> > > >>>> >> > > That would also work, but perhaps responding with the >> actual >> > > >>>> leader >> > > >>>> >> > > offset-timestamp entries (corresponding to the fetched >> > portion) >> > > >>>> would >> > > >>>> >> > > be exact and it should be small as well. Anyway, the main >> > > >>>> motivation >> > > >>>> >> > > in this was to avoid leaking server-side timestamps to the >> > > >>>> >> > > message-format if people think it is worth it so the >> > > >>>> alternatives are >> > > >>>> >> > > implementation details. My original instinct was that it >> also >> > > >>>> avoids a >> > > >>>> >> > > backwards incompatible change (but it does not because we >> > also >> > > >>>> have >> > > >>>> >> > > the relative offset change). >> > > >>>> >> > > >> > > >>>> >> > > Thanks, >> > > >>>> >> > > >> > > >>>> >> > > Joel >> > > >>>> >> > > >> > > >>>> >> > > > >> > > >>>> >> > > > >> > > >>>> >> > > > >> > > >>>> >> > > > On Thu, Sep 10, 2015 at 3:36 PM, Joel Koshy < >> > > >>>> jjkosh...@gmail.com> >> > > >>>> >> > wrote: >> > > >>>> >> > > > >> > > >>>> >> > > >> I just wanted to comment on a few points made earlier in >> > > this >> > > >>>> >> thread: >> > > >>>> >> > > >> >> > > >>>> >> > > >> Concerns on clock skew: at least for the original >> > proposal's >> > > >>>> scope >> > > >>>> >> > > >> (which was more for honoring retention broker-side) this >> > > >>>> would only >> > > >>>> >> be >> > > >>>> >> > > >> an issue when spanning leader movements right? i.e., >> > leader >> > > >>>> >> migration >> > > >>>> >> > > >> latency has to be much less than clock skew for this to >> > be a >> > > >>>> real >> > > >>>> >> > > >> issue wouldn’t it? >> > > >>>> >> > > >> >> > > >>>> >> > > >> Client timestamp vs broker timestamp: I’m not sure Kafka >> > > >>>> (brokers) >> > > >>>> >> are >> > > >>>> >> > > >> the right place to reason about client-side timestamps >> > > >>>> precisely due >> > > >>>> >> > > >> to the nuances that have been discussed at length in >> this >> > > >>>> thread. My >> > > >>>> >> > > >> preference would have been to the timestamp (now called >> > > >>>> >> > > >> LogAppendTimestamp) have nothing to do with the >> > > applications. >> > > >>>> Ewen >> > > >>>> >> > > >> raised a valid concern about leaking such >> > > >>>> “private/server-side” >> > > >>>> >> > > >> timestamps into the protocol spec. i.e., it is fine to >> > have >> > > >>>> the >> > > >>>> >> > > >> CreateTime which is expressly client-provided and >> > immutable >> > > >>>> >> > > >> thereafter, but the LogAppendTime is also going part of >> > the >> > > >>>> protocol >> > > >>>> >> > > >> and it would be good to avoid exposure (to client >> > > developers) >> > > >>>> if >> > > >>>> >> > > >> possible. Ok, so here is a slightly different approach >> > that >> > > I >> > > >>>> was >> > > >>>> >> just >> > > >>>> >> > > >> thinking about (and did not think too far so it may not >> > > >>>> work): do >> > > >>>> >> not >> > > >>>> >> > > >> add the LogAppendTime to messages. Instead, build the >> > > >>>> time-based >> > > >>>> >> index >> > > >>>> >> > > >> on the server side on message arrival time alone. >> > Introduce >> > > a >> > > >>>> new >> > > >>>> >> > > >> ReplicaFetchRequest/Response pair. ReplicaFetchResponses >> > > will >> > > >>>> also >> > > >>>> >> > > >> include the slice of the time-based index for the >> follower >> > > >>>> broker. >> > > >>>> >> > > >> This way we can at least keep timestamps aligned across >> > > >>>> brokers for >> > > >>>> >> > > >> retention purposes. We do lose the append timestamp for >> > > >>>> mirroring >> > > >>>> >> > > >> pipelines (which appears to be the case in KIP-32 as >> > well). >> > > >>>> >> > > >> >> > > >>>> >> > > >> Configurable index granularity: We can do this but I’m >> not >> > > >>>> sure it >> > > >>>> >> is >> > > >>>> >> > > >> very useful and as Jay noted, a major change from the >> old >> > > >>>> proposal >> > > >>>> >> > > >> linked from the KIP is the sparse time-based index which >> > we >> > > >>>> felt was >> > > >>>> >> > > >> essential to bound memory usage (and having timestamps >> on >> > > >>>> each log >> > > >>>> >> > > >> index entry was probably a big waste since in the common >> > > case >> > > >>>> >> several >> > > >>>> >> > > >> messages span the same timestamp). BTW another benefit >> of >> > > the >> > > >>>> second >> > > >>>> >> > > >> index is that it makes it easier to roll-back or throw >> > away >> > > if >> > > >>>> >> > > >> necessary (vs. modifying the existing index format) - >> > > >>>> although that >> > > >>>> >> > > >> obviously does not help with rolling back the timestamp >> > > >>>> change in >> > > >>>> >> the >> > > >>>> >> > > >> message format, but it is one less thing to worry about. >> > > >>>> >> > > >> >> > > >>>> >> > > >> Versioning: I’m not sure everyone is saying the same >> thing >> > > >>>> wrt the >> > > >>>> >> > > >> scope of this. There is the record format change, but I >> > also >> > > >>>> think >> > > >>>> >> > > >> this ties into all of the API versioning that we already >> > > have >> > > >>>> in >> > > >>>> >> > > >> Kafka. The current API versioning approach works fine >> for >> > > >>>> >> > > >> upgrades/downgrades across official Kafka releases, but >> > not >> > > >>>> so well >> > > >>>> >> > > >> between releases. (We almost got bitten by this at >> > LinkedIn >> > > >>>> with the >> > > >>>> >> > > >> recent changes to various requests but were able to work >> > > >>>> around >> > > >>>> >> > > >> these.) We can clarify this in the follow-up KIP. >> > > >>>> >> > > >> >> > > >>>> >> > > >> Thanks, >> > > >>>> >> > > >> >> > > >>>> >> > > >> Joel >> > > >>>> >> > > >> >> > > >>>> >> > > >> >> > > >>>> >> > > >> On Thu, Sep 10, 2015 at 3:00 PM, Jiangjie Qin >> > > >>>> >> > <j...@linkedin.com.invalid >> > > >>>> >> > > > >> > > >>>> >> > > >> wrote: >> > > >>>> >> > > >> > Hi Jay, >> > > >>>> >> > > >> > >> > > >>>> >> > > >> > I just changed the KIP title and updated the KIP page. >> > > >>>> >> > > >> > >> > > >>>> >> > > >> > And yes, we are working on a general version control >> > > >>>> proposal to >> > > >>>> >> > make >> > > >>>> >> > > the >> > > >>>> >> > > >> > protocol migration like this more smooth. I will also >> > > >>>> create a KIP >> > > >>>> >> > for >> > > >>>> >> > > >> that >> > > >>>> >> > > >> > soon. >> > > >>>> >> > > >> > >> > > >>>> >> > > >> > Thanks, >> > > >>>> >> > > >> > >> > > >>>> >> > > >> > Jiangjie (Becket) Qin >> > > >>>> >> > > >> > >> > > >>>> >> > > >> > >> > > >>>> >> > > >> > On Thu, Sep 10, 2015 at 2:21 PM, Jay Kreps < >> > > >>>> j...@confluent.io> >> > > >>>> >> > wrote: >> > > >>>> >> > > >> > >> > > >>>> >> > > >> >> Great, can we change the name to something related to >> > the >> > > >>>> >> > > >> change--"KIP-31: >> > > >>>> >> > > >> >> Move to relative offsets in compressed message sets". >> > > >>>> >> > > >> >> >> > > >>>> >> > > >> >> Also you had mentioned before you were going to >> expand >> > on >> > > >>>> the >> > > >>>> >> > > mechanics >> > > >>>> >> > > >> of >> > > >>>> >> > > >> >> handling these log format changes, right? >> > > >>>> >> > > >> >> >> > > >>>> >> > > >> >> -Jay >> > > >>>> >> > > >> >> >> > > >>>> >> > > >> >> On Thu, Sep 10, 2015 at 12:42 PM, Jiangjie Qin >> > > >>>> >> > > >> <j...@linkedin.com.invalid> >> > > >>>> >> > > >> >> wrote: >> > > >>>> >> > > >> >> >> > > >>>> >> > > >> >> > Neha and Jay, >> > > >>>> >> > > >> >> > >> > > >>>> >> > > >> >> > Thanks a lot for the feedback. Good point about >> > > >>>> splitting the >> > > >>>> >> > > >> >> discussion. I >> > > >>>> >> > > >> >> > have split the proposal to three KIPs and it does >> > make >> > > >>>> each >> > > >>>> >> > > discussion >> > > >>>> >> > > >> >> more >> > > >>>> >> > > >> >> > clear: >> > > >>>> >> > > >> >> > KIP-31 - Message format change (Use relative >> offset) >> > > >>>> >> > > >> >> > KIP-32 - Add CreateTime and LogAppendTime to Kafka >> > > >>>> message >> > > >>>> >> > > >> >> > KIP-33 - Build a time-based log index >> > > >>>> >> > > >> >> > >> > > >>>> >> > > >> >> > KIP-33 can be a follow up KIP for KIP-32, so we can >> > > >>>> discuss >> > > >>>> >> about >> > > >>>> >> > > >> KIP-31 >> > > >>>> >> > > >> >> > and KIP-32 first for now. I will create a separate >> > > >>>> discussion >> > > >>>> >> > > thread >> > > >>>> >> > > >> for >> > > >>>> >> > > >> >> > KIP-32 and reply the concerns you raised regarding >> > the >> > > >>>> >> timestamp. >> > > >>>> >> > > >> >> > >> > > >>>> >> > > >> >> > So far it looks there is no objection to KIP-31. >> > Since >> > > I >> > > >>>> >> removed >> > > >>>> >> > a >> > > >>>> >> > > few >> > > >>>> >> > > >> >> part >> > > >>>> >> > > >> >> > from previous KIP and only left the relative offset >> > > >>>> proposal, >> > > >>>> >> it >> > > >>>> >> > > >> would be >> > > >>>> >> > > >> >> > great if people can take another look to see if >> there >> > > is >> > > >>>> any >> > > >>>> >> > > concerns. >> > > >>>> >> > > >> >> > >> > > >>>> >> > > >> >> > Thanks, >> > > >>>> >> > > >> >> > >> > > >>>> >> > > >> >> > Jiangjie (Becket) Qin >> > > >>>> >> > > >> >> > >> > > >>>> >> > > >> >> > >> > > >>>> >> > > >> >> > On Tue, Sep 8, 2015 at 1:28 PM, Neha Narkhede < >> > > >>>> >> n...@confluent.io >> > > >>>> >> > > >> > > >>>> >> > > >> wrote: >> > > >>>> >> > > >> >> > >> > > >>>> >> > > >> >> > > Becket, >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > > Nice write-up. Few thoughts - >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > > I'd split up the discussion for simplicity. Note >> > that >> > > >>>> you can >> > > >>>> >> > > always >> > > >>>> >> > > >> >> > group >> > > >>>> >> > > >> >> > > several of these in one patch to reduce the >> > protocol >> > > >>>> changes >> > > >>>> >> > > people >> > > >>>> >> > > >> >> have >> > > >>>> >> > > >> >> > to >> > > >>>> >> > > >> >> > > deal with.This is just a suggestion, but I think >> > the >> > > >>>> >> following >> > > >>>> >> > > split >> > > >>>> >> > > >> >> > might >> > > >>>> >> > > >> >> > > make it easier to tackle the changes being >> > proposed - >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > > - Relative offsets >> > > >>>> >> > > >> >> > > - Introducing the concept of time >> > > >>>> >> > > >> >> > > - Time-based indexing (separate the usage of >> the >> > > >>>> timestamp >> > > >>>> >> > > field >> > > >>>> >> > > >> >> from >> > > >>>> >> > > >> >> > > how/whether we want to include a timestamp in >> > the >> > > >>>> message) >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > > I'm a +1 on relative offsets, we should've done >> it >> > > >>>> back when >> > > >>>> >> we >> > > >>>> >> > > >> >> > introduced >> > > >>>> >> > > >> >> > > it. Other than reducing the CPU overhead, this >> will >> > > >>>> also >> > > >>>> >> reduce >> > > >>>> >> > > the >> > > >>>> >> > > >> >> > garbage >> > > >>>> >> > > >> >> > > collection overhead on the brokers. >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > > On the timestamp field, I generally agree that we >> > > >>>> should add >> > > >>>> >> a >> > > >>>> >> > > >> >> timestamp >> > > >>>> >> > > >> >> > to >> > > >>>> >> > > >> >> > > a Kafka message but I'm not quite sold on how >> this >> > > KIP >> > > >>>> >> suggests >> > > >>>> >> > > the >> > > >>>> >> > > >> >> > > timestamp be set. Will avoid repeating the >> > downsides >> > > >>>> of a >> > > >>>> >> > broker >> > > >>>> >> > > >> side >> > > >>>> >> > > >> >> > > timestamp mentioned previously in this thread. I >> > > think >> > > >>>> the >> > > >>>> >> > topic >> > > >>>> >> > > of >> > > >>>> >> > > >> >> > > including a timestamp in a Kafka message >> requires a >> > > >>>> lot more >> > > >>>> >> > > thought >> > > >>>> >> > > >> >> and >> > > >>>> >> > > >> >> > > details than what's in this KIP. I'd suggest we >> > make >> > > >>>> it a >> > > >>>> >> > > separate >> > > >>>> >> > > >> KIP >> > > >>>> >> > > >> >> > that >> > > >>>> >> > > >> >> > > includes a list of all the different use cases >> for >> > > the >> > > >>>> >> > timestamp >> > > >>>> >> > > >> >> (beyond >> > > >>>> >> > > >> >> > > log retention) including stream processing and >> > > discuss >> > > >>>> >> > tradeoffs >> > > >>>> >> > > of >> > > >>>> >> > > >> >> > > including client and broker side timestamps. >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > > Agree with the benefit of time-based indexing, >> but >> > > >>>> haven't >> > > >>>> >> had >> > > >>>> >> > a >> > > >>>> >> > > >> chance >> > > >>>> >> > > >> >> > to >> > > >>>> >> > > >> >> > > dive into the design details yet. >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > > Thanks, >> > > >>>> >> > > >> >> > > Neha >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > > On Tue, Sep 8, 2015 at 10:57 AM, Jay Kreps < >> > > >>>> j...@confluent.io >> > > >>>> >> > >> > > >>>> >> > > >> wrote: >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > > > Hey Beckett, >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > > I was proposing splitting up the KIP just for >> > > >>>> simplicity of >> > > >>>> >> > > >> >> discussion. >> > > >>>> >> > > >> >> > > You >> > > >>>> >> > > >> >> > > > can still implement them in one patch. I think >> > > >>>> otherwise it >> > > >>>> >> > > will >> > > >>>> >> > > >> be >> > > >>>> >> > > >> >> > hard >> > > >>>> >> > > >> >> > > to >> > > >>>> >> > > >> >> > > > discuss/vote on them since if you like the >> offset >> > > >>>> proposal >> > > >>>> >> > but >> > > >>>> >> > > not >> > > >>>> >> > > >> >> the >> > > >>>> >> > > >> >> > > time >> > > >>>> >> > > >> >> > > > proposal what do you do? >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > > Introducing a second notion of time into Kafka >> > is a >> > > >>>> pretty >> > > >>>> >> > > massive >> > > >>>> >> > > >> >> > > > philosophical change so it kind of warrants >> it's >> > > own >> > > >>>> KIP I >> > > >>>> >> > > think >> > > >>>> >> > > >> it >> > > >>>> >> > > >> >> > isn't >> > > >>>> >> > > >> >> > > > just "Change message format". >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > > WRT time I think one thing to clarify in the >> > > >>>> proposal is >> > > >>>> >> how >> > > >>>> >> > MM >> > > >>>> >> > > >> will >> > > >>>> >> > > >> >> > have >> > > >>>> >> > > >> >> > > > access to set the timestamp? Presumably this >> will >> > > be >> > > >>>> a new >> > > >>>> >> > > field >> > > >>>> >> > > >> in >> > > >>>> >> > > >> >> > > > ProducerRecord, right? If so then any user can >> > set >> > > >>>> the >> > > >>>> >> > > timestamp, >> > > >>>> >> > > >> >> > right? >> > > >>>> >> > > >> >> > > > I'm not sure you answered the questions around >> > how >> > > >>>> this >> > > >>>> >> will >> > > >>>> >> > > work >> > > >>>> >> > > >> for >> > > >>>> >> > > >> >> > MM >> > > >>>> >> > > >> >> > > > since when MM retains timestamps from multiple >> > > >>>> partitions >> > > >>>> >> > they >> > > >>>> >> > > >> will >> > > >>>> >> > > >> >> > then >> > > >>>> >> > > >> >> > > be >> > > >>>> >> > > >> >> > > > out of order and in the past (so the >> > > >>>> >> > max(lastAppendedTimestamp, >> > > >>>> >> > > >> >> > > > currentTimeMillis) override you proposed will >> not >> > > >>>> work, >> > > >>>> >> > > right?). >> > > >>>> >> > > >> If >> > > >>>> >> > > >> >> we >> > > >>>> >> > > >> >> > > > don't do this then when you set up mirroring >> the >> > > >>>> data will >> > > >>>> >> > all >> > > >>>> >> > > be >> > > >>>> >> > > >> new >> > > >>>> >> > > >> >> > and >> > > >>>> >> > > >> >> > > > you have the same retention problem you >> > described. >> > > >>>> Maybe I >> > > >>>> >> > > missed >> > > >>>> >> > > >> >> > > > something...? >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > > My main motivation is that given that both >> Samza >> > > and >> > > >>>> Kafka >> > > >>>> >> > > streams >> > > >>>> >> > > >> >> are >> > > >>>> >> > > >> >> > > > doing work that implies a mandatory >> > client-defined >> > > >>>> notion >> > > >>>> >> of >> > > >>>> >> > > >> time, I >> > > >>>> >> > > >> >> > > really >> > > >>>> >> > > >> >> > > > think introducing a different mandatory notion >> of >> > > >>>> time in >> > > >>>> >> > > Kafka is >> > > >>>> >> > > >> >> > going >> > > >>>> >> > > >> >> > > to >> > > >>>> >> > > >> >> > > > be quite odd. We should think hard about how >> > > >>>> client-defined >> > > >>>> >> > > time >> > > >>>> >> > > >> >> could >> > > >>>> >> > > >> >> > > > work. I'm not sure if it can, but I'm also not >> > sure >> > > >>>> that it >> > > >>>> >> > > can't. >> > > >>>> >> > > >> >> > Having >> > > >>>> >> > > >> >> > > > both will be odd. Did you chat about this with >> > > >>>> Yi/Kartik on >> > > >>>> >> > the >> > > >>>> >> > > >> Samza >> > > >>>> >> > > >> >> > > side? >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > > When you are saying it won't work you are >> > assuming >> > > >>>> some >> > > >>>> >> > > particular >> > > >>>> >> > > >> >> > > > implementation? Maybe that the index is a >> > > >>>> monotonically >> > > >>>> >> > > increasing >> > > >>>> >> > > >> >> set >> > > >>>> >> > > >> >> > of >> > > >>>> >> > > >> >> > > > pointers to the least record with a timestamp >> > > larger >> > > >>>> than >> > > >>>> >> the >> > > >>>> >> > > >> index >> > > >>>> >> > > >> >> > time? >> > > >>>> >> > > >> >> > > > In other words a search for time X gives the >> > > largest >> > > >>>> offset >> > > >>>> >> > at >> > > >>>> >> > > >> which >> > > >>>> >> > > >> >> > all >> > > >>>> >> > > >> >> > > > records are <= X? >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > > For retention, I agree with the problem you >> point >> > > >>>> out, but >> > > >>>> >> I >> > > >>>> >> > > think >> > > >>>> >> > > >> >> what >> > > >>>> >> > > >> >> > > you >> > > >>>> >> > > >> >> > > > are saying in that case is that you want a size >> > > >>>> limit too. >> > > >>>> >> If >> > > >>>> >> > > you >> > > >>>> >> > > >> use >> > > >>>> >> > > >> >> > > > system time you actually hit the same problem: >> > say >> > > >>>> you do a >> > > >>>> >> > > full >> > > >>>> >> > > >> dump >> > > >>>> >> > > >> >> > of >> > > >>>> >> > > >> >> > > a >> > > >>>> >> > > >> >> > > > DB table with a setting of 7 days retention, >> your >> > > >>>> retention >> > > >>>> >> > > will >> > > >>>> >> > > >> >> > actually >> > > >>>> >> > > >> >> > > > not get enforced for the first 7 days because >> the >> > > >>>> data is >> > > >>>> >> > "new >> > > >>>> >> > > to >> > > >>>> >> > > >> >> > Kafka". >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > > -Jay >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > > On Mon, Sep 7, 2015 at 10:44 AM, Jiangjie Qin >> > > >>>> >> > > >> >> > <j...@linkedin.com.invalid >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > > wrote: >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > > > Jay, >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > > Thanks for the comments. Yes, there are >> > actually >> > > >>>> three >> > > >>>> >> > > >> proposals as >> > > >>>> >> > > >> >> > you >> > > >>>> >> > > >> >> > > > > pointed out. >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > > We will have a separate proposal for (1) - >> > > version >> > > >>>> >> control >> > > >>>> >> > > >> >> mechanism. >> > > >>>> >> > > >> >> > > We >> > > >>>> >> > > >> >> > > > > actually thought about whether we want to >> > > separate >> > > >>>> 2 and >> > > >>>> >> 3 >> > > >>>> >> > > >> >> internally >> > > >>>> >> > > >> >> > > > > before creating the KIP. The reason we put 2 >> > and >> > > 3 >> > > >>>> >> together >> > > >>>> >> > > is >> > > >>>> >> > > >> it >> > > >>>> >> > > >> >> > will >> > > >>>> >> > > >> >> > > > > saves us another cross board wire protocol >> > > change. >> > > >>>> Like >> > > >>>> >> you >> > > >>>> >> > > >> said, >> > > >>>> >> > > >> >> we >> > > >>>> >> > > >> >> > > have >> > > >>>> >> > > >> >> > > > > to migrate all the clients in all languages. >> To >> > > >>>> some >> > > >>>> >> > extent, >> > > >>>> >> > > the >> > > >>>> >> > > >> >> > effort >> > > >>>> >> > > >> >> > > > to >> > > >>>> >> > > >> >> > > > > spend on upgrading the clients can be even >> > bigger >> > > >>>> than >> > > >>>> >> > > >> implementing >> > > >>>> >> > > >> >> > the >> > > >>>> >> > > >> >> > > > new >> > > >>>> >> > > >> >> > > > > feature itself. So there are some attractions >> > if >> > > >>>> we can >> > > >>>> >> do >> > > >>>> >> > 2 >> > > >>>> >> > > >> and 3 >> > > >>>> >> > > >> >> > > > together >> > > >>>> >> > > >> >> > > > > instead of separately. Maybe after (1) is >> done >> > it >> > > >>>> will be >> > > >>>> >> > > >> easier to >> > > >>>> >> > > >> >> > do >> > > >>>> >> > > >> >> > > > > protocol migration. But if we are able to >> come >> > to >> > > >>>> an >> > > >>>> >> > > agreement >> > > >>>> >> > > >> on >> > > >>>> >> > > >> >> the >> > > >>>> >> > > >> >> > > > > timestamp solution, I would prefer to have it >> > > >>>> together >> > > >>>> >> with >> > > >>>> >> > > >> >> relative >> > > >>>> >> > > >> >> > > > offset >> > > >>>> >> > > >> >> > > > > in the interest of avoiding another wire >> > protocol >> > > >>>> change >> > > >>>> >> > (the >> > > >>>> >> > > >> >> process >> > > >>>> >> > > >> >> > > to >> > > >>>> >> > > >> >> > > > > migrate to relative offset is exactly the >> same >> > as >> > > >>>> migrate >> > > >>>> >> > to >> > > >>>> >> > > >> >> message >> > > >>>> >> > > >> >> > > with >> > > >>>> >> > > >> >> > > > > timestamp). >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > > In terms of timestamp. I completely agree >> that >> > > >>>> having >> > > >>>> >> > client >> > > >>>> >> > > >> >> > timestamp >> > > >>>> >> > > >> >> > > is >> > > >>>> >> > > >> >> > > > > more useful if we can make sure the timestamp >> > is >> > > >>>> good. >> > > >>>> >> But >> > > >>>> >> > in >> > > >>>> >> > > >> >> reality >> > > >>>> >> > > >> >> > > > that >> > > >>>> >> > > >> >> > > > > can be a really big *IF*. I think the problem >> > is >> > > >>>> exactly >> > > >>>> >> as >> > > >>>> >> > > Ewen >> > > >>>> >> > > >> >> > > > mentioned, >> > > >>>> >> > > >> >> > > > > if we let the client to set the timestamp, it >> > > >>>> would be >> > > >>>> >> very >> > > >>>> >> > > hard >> > > >>>> >> > > >> >> for >> > > >>>> >> > > >> >> > > the >> > > >>>> >> > > >> >> > > > > broker to utilize it. If broker apply >> retention >> > > >>>> policy >> > > >>>> >> > based >> > > >>>> >> > > on >> > > >>>> >> > > >> the >> > > >>>> >> > > >> >> > > > client >> > > >>>> >> > > >> >> > > > > timestamp. One misbehave producer can >> > potentially >> > > >>>> >> > completely >> > > >>>> >> > > >> mess >> > > >>>> >> > > >> >> up >> > > >>>> >> > > >> >> > > the >> > > >>>> >> > > >> >> > > > > retention policy on the broker. Although >> people >> > > >>>> don't >> > > >>>> >> care >> > > >>>> >> > > about >> > > >>>> >> > > >> >> > server >> > > >>>> >> > > >> >> > > > > side timestamp. People do care a lot when >> > > timestamp >> > > >>>> >> breaks. >> > > >>>> >> > > >> >> Searching >> > > >>>> >> > > >> >> > > by >> > > >>>> >> > > >> >> > > > > timestamp is a really important use case even >> > > >>>> though it >> > > >>>> >> is >> > > >>>> >> > > not >> > > >>>> >> > > >> used >> > > >>>> >> > > >> >> > as >> > > >>>> >> > > >> >> > > > > often as searching by offset. It has >> > significant >> > > >>>> direct >> > > >>>> >> > > impact >> > > >>>> >> > > >> on >> > > >>>> >> > > >> >> RTO >> > > >>>> >> > > >> >> > > > when >> > > >>>> >> > > >> >> > > > > there is a cross cluster failover as Todd >> > > >>>> mentioned. >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > > The trick using max(lastAppendedTimestamp, >> > > >>>> >> > currentTimeMillis) >> > > >>>> >> > > >> is to >> > > >>>> >> > > >> >> > > > > guarantee monotonic increase of the >> timestamp. >> > > Many >> > > >>>> >> > > commercial >> > > >>>> >> > > >> >> system >> > > >>>> >> > > >> >> > > > > actually do something similar to this to >> solve >> > > the >> > > >>>> time >> > > >>>> >> > skew. >> > > >>>> >> > > >> About >> > > >>>> >> > > >> >> > > > > changing the time, I am not sure if people >> use >> > > NTP >> > > >>>> like >> > > >>>> >> > > using a >> > > >>>> >> > > >> >> watch >> > > >>>> >> > > >> >> > > to >> > > >>>> >> > > >> >> > > > > just set it forward/backward by an hour or >> so. >> > > The >> > > >>>> time >> > > >>>> >> > > >> adjustment >> > > >>>> >> > > >> >> I >> > > >>>> >> > > >> >> > > used >> > > >>>> >> > > >> >> > > > > to do is typically to adjust something like a >> > > >>>> minute / >> > > >>>> >> > > week. So >> > > >>>> >> > > >> >> for >> > > >>>> >> > > >> >> > > each >> > > >>>> >> > > >> >> > > > > second, there might be a few mircoseconds >> > > >>>> slower/faster >> > > >>>> >> but >> > > >>>> >> > > >> should >> > > >>>> >> > > >> >> > not >> > > >>>> >> > > >> >> > > > > break the clock completely to make sure all >> the >> > > >>>> >> time-based >> > > >>>> >> > > >> >> > transactions >> > > >>>> >> > > >> >> > > > are >> > > >>>> >> > > >> >> > > > > not affected. The one minute change will be >> > done >> > > >>>> within a >> > > >>>> >> > > week >> > > >>>> >> > > >> but >> > > >>>> >> > > >> >> > not >> > > >>>> >> > > >> >> > > > > instantly. >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > > Personally, I think having client side >> > timestamp >> > > >>>> will be >> > > >>>> >> > > useful >> > > >>>> >> > > >> if >> > > >>>> >> > > >> >> we >> > > >>>> >> > > >> >> > > > don't >> > > >>>> >> > > >> >> > > > > need to put the broker and data integrity >> under >> > > >>>> risk. If >> > > >>>> >> we >> > > >>>> >> > > >> have to >> > > >>>> >> > > >> >> > > > choose >> > > >>>> >> > > >> >> > > > > from one of them but not both. I would prefer >> > > >>>> server side >> > > >>>> >> > > >> timestamp >> > > >>>> >> > > >> >> > > > because >> > > >>>> >> > > >> >> > > > > for client side timestamp there is always a >> > plan >> > > B >> > > >>>> which >> > > >>>> >> is >> > > >>>> >> > > >> putting >> > > >>>> >> > > >> >> > the >> > > >>>> >> > > >> >> > > > > timestamp into payload. >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > > Another reason I am reluctant to use the >> client >> > > >>>> side >> > > >>>> >> > > timestamp >> > > >>>> >> > > >> is >> > > >>>> >> > > >> >> > that >> > > >>>> >> > > >> >> > > it >> > > >>>> >> > > >> >> > > > > is always dangerous to mix the control plane >> > with >> > > >>>> data >> > > >>>> >> > > plane. IP >> > > >>>> >> > > >> >> did >> > > >>>> >> > > >> >> > > this >> > > >>>> >> > > >> >> > > > > and it has caused so many different breaches >> so >> > > >>>> people >> > > >>>> >> are >> > > >>>> >> > > >> >> migrating >> > > >>>> >> > > >> >> > to >> > > >>>> >> > > >> >> > > > > something like MPLS. An example in Kafka is >> > that >> > > >>>> any >> > > >>>> >> client >> > > >>>> >> > > can >> > > >>>> >> > > >> >> > > > construct a >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > >>>> >> LeaderAndIsrRequest/UpdateMetadataRequest/ContorlledShutdownRequest >> > > >>>> >> > > >> >> > > (you >> > > >>>> >> > > >> >> > > > > name it) and send it to the broker to mess up >> > the >> > > >>>> entire >> > > >>>> >> > > >> cluster, >> > > >>>> >> > > >> >> > also >> > > >>>> >> > > >> >> > > as >> > > >>>> >> > > >> >> > > > > we already noticed a busy cluster can respond >> > > >>>> quite slow >> > > >>>> >> to >> > > >>>> >> > > >> >> > controller >> > > >>>> >> > > >> >> > > > > messages. So it would really be nice if we >> can >> > > >>>> avoid >> > > >>>> >> giving >> > > >>>> >> > > the >> > > >>>> >> > > >> >> power >> > > >>>> >> > > >> >> > > to >> > > >>>> >> > > >> >> > > > > clients to control the log retention. >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > > Thanks, >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > > Jiangjie (Becket) Qin >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > > On Sun, Sep 6, 2015 at 9:54 PM, Todd Palino < >> > > >>>> >> > > tpal...@gmail.com> >> > > >>>> >> > > >> >> > wrote: >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > > > So, with regards to why you want to search >> by >> > > >>>> >> timestamp, >> > > >>>> >> > > the >> > > >>>> >> > > >> >> > biggest >> > > >>>> >> > > >> >> > > > > > problem I've seen is with consumers who >> want >> > to >> > > >>>> reset >> > > >>>> >> > their >> > > >>>> >> > > >> >> > > timestamps >> > > >>>> >> > > >> >> > > > > to a >> > > >>>> >> > > >> >> > > > > > specific point, whether it is to replay a >> > > certain >> > > >>>> >> amount >> > > >>>> >> > of >> > > >>>> >> > > >> >> > messages, >> > > >>>> >> > > >> >> > > > or >> > > >>>> >> > > >> >> > > > > to >> > > >>>> >> > > >> >> > > > > > rewind to before some problem state >> existed. >> > > This >> > > >>>> >> happens >> > > >>>> >> > > more >> > > >>>> >> > > >> >> > often >> > > >>>> >> > > >> >> > > > than >> > > >>>> >> > > >> >> > > > > > anyone would like. >> > > >>>> >> > > >> >> > > > > > >> > > >>>> >> > > >> >> > > > > > To handle this now we need to constantly >> > export >> > > >>>> the >> > > >>>> >> > > broker's >> > > >>>> >> > > >> >> offset >> > > >>>> >> > > >> >> > > for >> > > >>>> >> > > >> >> > > > > > every partition to a time-series database >> and >> > > >>>> then use >> > > >>>> >> > > >> external >> > > >>>> >> > > >> >> > > > processes >> > > >>>> >> > > >> >> > > > > > to query this. I know we're not the only >> ones >> > > >>>> doing >> > > >>>> >> this. >> > > >>>> >> > > The >> > > >>>> >> > > >> way >> > > >>>> >> > > >> >> > the >> > > >>>> >> > > >> >> > > > > > broker handles requests for offsets by >> > > timestamp >> > > >>>> is a >> > > >>>> >> > > little >> > > >>>> >> > > >> >> obtuse >> > > >>>> >> > > >> >> > > > > > (explain it to anyone without intimate >> > > knowledge >> > > >>>> of the >> > > >>>> >> > > >> internal >> > > >>>> >> > > >> >> > > > workings >> > > >>>> >> > > >> >> > > > > > of the broker - every time I do I see >> this). >> > In >> > > >>>> >> addition, >> > > >>>> >> > > as >> > > >>>> >> > > >> >> Becket >> > > >>>> >> > > >> >> > > > > pointed >> > > >>>> >> > > >> >> > > > > > out, it causes problems specifically with >> > > >>>> retention of >> > > >>>> >> > > >> messages >> > > >>>> >> > > >> >> by >> > > >>>> >> > > >> >> > > time >> > > >>>> >> > > >> >> > > > > > when you move partitions around. >> > > >>>> >> > > >> >> > > > > > >> > > >>>> >> > > >> >> > > > > > I'm deliberately avoiding the discussion of >> > > what >> > > >>>> >> > timestamp >> > > >>>> >> > > to >> > > >>>> >> > > >> >> use. >> > > >>>> >> > > >> >> > I >> > > >>>> >> > > >> >> > > > can >> > > >>>> >> > > >> >> > > > > > see the argument either way, though I tend >> to >> > > >>>> lean >> > > >>>> >> > towards >> > > >>>> >> > > the >> > > >>>> >> > > >> >> idea >> > > >>>> >> > > >> >> > > > that >> > > >>>> >> > > >> >> > > > > > the broker timestamp is the only viable >> > source >> > > >>>> of truth >> > > >>>> >> > in >> > > >>>> >> > > >> this >> > > >>>> >> > > >> >> > > > > situation. >> > > >>>> >> > > >> >> > > > > > >> > > >>>> >> > > >> >> > > > > > -Todd >> > > >>>> >> > > >> >> > > > > > >> > > >>>> >> > > >> >> > > > > > >> > > >>>> >> > > >> >> > > > > > On Sun, Sep 6, 2015 at 7:08 PM, Ewen >> > > >>>> Cheslack-Postava < >> > > >>>> >> > > >> >> > > > e...@confluent.io >> > > >>>> >> > > >> >> > > > > > >> > > >>>> >> > > >> >> > > > > > wrote: >> > > >>>> >> > > >> >> > > > > > >> > > >>>> >> > > >> >> > > > > > > On Sun, Sep 6, 2015 at 4:57 PM, Jay >> Kreps < >> > > >>>> >> > > j...@confluent.io >> > > >>>> >> > > >> > >> > > >>>> >> > > >> >> > > wrote: >> > > >>>> >> > > >> >> > > > > > > >> > > >>>> >> > > >> >> > > > > > > > >> > > >>>> >> > > >> >> > > > > > > > 2. Nobody cares what time it is on the >> > > >>>> server. >> > > >>>> >> > > >> >> > > > > > > > >> > > >>>> >> > > >> >> > > > > > > >> > > >>>> >> > > >> >> > > > > > > This is a good way of summarizing the >> > issue I >> > > >>>> was >> > > >>>> >> > trying >> > > >>>> >> > > to >> > > >>>> >> > > >> get >> > > >>>> >> > > >> >> > at, >> > > >>>> >> > > >> >> > > > > from >> > > >>>> >> > > >> >> > > > > > an >> > > >>>> >> > > >> >> > > > > > > app's perspective. Of the 3 stated goals >> of >> > > >>>> the KIP, >> > > >>>> >> #2 >> > > >>>> >> > > (lot >> > > >>>> >> > > >> >> > > > retention) >> > > >>>> >> > > >> >> > > > > > is >> > > >>>> >> > > >> >> > > > > > > reasonably handled by a server-side >> > > timestamp. >> > > >>>> I >> > > >>>> >> really >> > > >>>> >> > > just >> > > >>>> >> > > >> >> care >> > > >>>> >> > > >> >> > > > that >> > > >>>> >> > > >> >> > > > > a >> > > >>>> >> > > >> >> > > > > > > message is there long enough that I have >> a >> > > >>>> chance to >> > > >>>> >> > > process >> > > >>>> >> > > >> >> it. >> > > >>>> >> > > >> >> > #3 >> > > >>>> >> > > >> >> > > > > > > (searching by timestamp) only seems >> useful >> > if >> > > >>>> we can >> > > >>>> >> > > >> guarantee >> > > >>>> >> > > >> >> > the >> > > >>>> >> > > >> >> > > > > > > server-side timestamp is close enough to >> > the >> > > >>>> original >> > > >>>> >> > > >> >> client-side >> > > >>>> >> > > >> >> > > > > > > timestamp, and any mirror maker step >> seems >> > to >> > > >>>> break >> > > >>>> >> > that >> > > >>>> >> > > >> (even >> > > >>>> >> > > >> >> > > > ignoring >> > > >>>> >> > > >> >> > > > > > any >> > > >>>> >> > > >> >> > > > > > > issues with broker availability). >> > > >>>> >> > > >> >> > > > > > > >> > > >>>> >> > > >> >> > > > > > > I'm also wondering whether optimizing for >> > > >>>> >> > > >> search-by-timestamp >> > > >>>> >> > > >> >> on >> > > >>>> >> > > >> >> > > the >> > > >>>> >> > > >> >> > > > > > broker >> > > >>>> >> > > >> >> > > > > > > is really something we want to do given >> > that >> > > >>>> messages >> > > >>>> >> > > aren't >> > > >>>> >> > > >> >> > really >> > > >>>> >> > > >> >> > > > > > > guaranteed to be ordered by >> > application-level >> > > >>>> >> > timestamps >> > > >>>> >> > > on >> > > >>>> >> > > >> the >> > > >>>> >> > > >> >> > > > broker. >> > > >>>> >> > > >> >> > > > > > Is >> > > >>>> >> > > >> >> > > > > > > part of the need for this just due to the >> > > >>>> current >> > > >>>> >> > > consumer >> > > >>>> >> > > >> APIs >> > > >>>> >> > > >> >> > > being >> > > >>>> >> > > >> >> > > > > > > difficult to work with? For example, >> could >> > > you >> > > >>>> >> > implement >> > > >>>> >> > > >> this >> > > >>>> >> > > >> >> > > pretty >> > > >>>> >> > > >> >> > > > > > easily >> > > >>>> >> > > >> >> > > > > > > client side just the way you would >> > > >>>> broker-side? I'd >> > > >>>> >> > > imagine >> > > >>>> >> > > >> a >> > > >>>> >> > > >> >> > > couple >> > > >>>> >> > > >> >> > > > of >> > > >>>> >> > > >> >> > > > > > > random seeks + reads during very rare >> > > >>>> occasions (i.e. >> > > >>>> >> > > when >> > > >>>> >> > > >> the >> > > >>>> >> > > >> >> > app >> > > >>>> >> > > >> >> > > > > starts >> > > >>>> >> > > >> >> > > > > > > up) wouldn't be a problem >> performance-wise. >> > > Or >> > > >>>> is it >> > > >>>> >> > also >> > > >>>> >> > > >> that >> > > >>>> >> > > >> >> > you >> > > >>>> >> > > >> >> > > > need >> > > >>>> >> > > >> >> > > > > > the >> > > >>>> >> > > >> >> > > > > > > broker to enforce things like >> monotonically >> > > >>>> >> increasing >> > > >>>> >> > > >> >> timestamps >> > > >>>> >> > > >> >> > > > since >> > > >>>> >> > > >> >> > > > > > you >> > > >>>> >> > > >> >> > > > > > > can't do the query properly and >> efficiently >> > > >>>> without >> > > >>>> >> > that >> > > >>>> >> > > >> >> > guarantee, >> > > >>>> >> > > >> >> > > > and >> > > >>>> >> > > >> >> > > > > > > therefore what applications are actually >> > > >>>> looking for >> > > >>>> >> > *is* >> > > >>>> >> > > >> >> > > broker-side >> > > >>>> >> > > >> >> > > > > > > timestamps? >> > > >>>> >> > > >> >> > > > > > > >> > > >>>> >> > > >> >> > > > > > > -Ewen >> > > >>>> >> > > >> >> > > > > > > >> > > >>>> >> > > >> >> > > > > > > >> > > >>>> >> > > >> >> > > > > > > >> > > >>>> >> > > >> >> > > > > > > > Consider cases where data is being >> copied >> > > >>>> from a >> > > >>>> >> > > database >> > > >>>> >> > > >> or >> > > >>>> >> > > >> >> > from >> > > >>>> >> > > >> >> > > > log >> > > >>>> >> > > >> >> > > > > > > > files. In steady-state the server time >> is >> > > >>>> very >> > > >>>> >> close >> > > >>>> >> > to >> > > >>>> >> > > >> the >> > > >>>> >> > > >> >> > > client >> > > >>>> >> > > >> >> > > > > time >> > > >>>> >> > > >> >> > > > > > > if >> > > >>>> >> > > >> >> > > > > > > > their clocks are sync'd (see 1) but >> there >> > > >>>> will be >> > > >>>> >> > > times of >> > > >>>> >> > > >> >> > large >> > > >>>> >> > > >> >> > > > > > > divergence >> > > >>>> >> > > >> >> > > > > > > > when the copying process is stopped or >> > > falls >> > > >>>> >> behind. >> > > >>>> >> > > When >> > > >>>> >> > > >> >> this >> > > >>>> >> > > >> >> > > > occurs >> > > >>>> >> > > >> >> > > > > > it >> > > >>>> >> > > >> >> > > > > > > is >> > > >>>> >> > > >> >> > > > > > > > clear that the time the data arrived on >> > the >> > > >>>> server >> > > >>>> >> is >> > > >>>> >> > > >> >> > irrelevant, >> > > >>>> >> > > >> >> > > > it >> > > >>>> >> > > >> >> > > > > is >> > > >>>> >> > > >> >> > > > > > > the >> > > >>>> >> > > >> >> > > > > > > > source timestamp that matters. This is >> > the >> > > >>>> problem >> > > >>>> >> > you >> > > >>>> >> > > are >> > > >>>> >> > > >> >> > trying >> > > >>>> >> > > >> >> > > > to >> > > >>>> >> > > >> >> > > > > > fix >> > > >>>> >> > > >> >> > > > > > > by >> > > >>>> >> > > >> >> > > > > > > > retaining the mm timestamp but really >> the >> > > >>>> client >> > > >>>> >> > should >> > > >>>> >> > > >> >> always >> > > >>>> >> > > >> >> > > set >> > > >>>> >> > > >> >> > > > > the >> > > >>>> >> > > >> >> > > > > > > time >> > > >>>> >> > > >> >> > > > > > > > with the use of server-side time as a >> > > >>>> fallback. It >> > > >>>> >> > > would >> > > >>>> >> > > >> be >> > > >>>> >> > > >> >> > worth >> > > >>>> >> > > >> >> > > > > > talking >> > > >>>> >> > > >> >> > > > > > > > to the Samza folks and reading through >> > this >> > > >>>> blog >> > > >>>> >> > post ( >> > > >>>> >> > > >> >> > > > > > > > >> > > >>>> >> > > >> >> > > > > > > >> > > >>>> >> > > >> >> > > > > > >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > >> > > >>>> >> > > >> >> >> > > >>>> >> > > >> >> > > >>>> >> > > >> > > >>>> >> > >> > > >>>> >> >> > > >>>> >> > > >> > >> http://radar.oreilly.com/2015/08/the-world-beyond-batch-streaming-101.html >> > > >>>> >> > > >> >> > > > > > > > ) >> > > >>>> >> > > >> >> > > > > > > > on this subject since we went through >> > > similar >> > > >>>> >> > > learnings on >> > > >>>> >> > > >> >> the >> > > >>>> >> > > >> >> > > > stream >> > > >>>> >> > > >> >> > > > > > > > processing side. >> > > >>>> >> > > >> >> > > > > > > > >> > > >>>> >> > > >> >> > > > > > > > I think the implication of these two is >> > > that >> > > >>>> we >> > > >>>> >> need >> > > >>>> >> > a >> > > >>>> >> > > >> >> proposal >> > > >>>> >> > > >> >> > > > that >> > > >>>> >> > > >> >> > > > > > > > handles potentially very out-of-order >> > > >>>> timestamps in >> > > >>>> >> > > some >> > > >>>> >> > > >> kind >> > > >>>> >> > > >> >> > of >> > > >>>> >> > > >> >> > > > > sanish >> > > >>>> >> > > >> >> > > > > > > way >> > > >>>> >> > > >> >> > > > > > > > (buggy clients will set something >> totally >> > > >>>> wrong as >> > > >>>> >> > the >> > > >>>> >> > > >> time). >> > > >>>> >> > > >> >> > > > > > > > >> > > >>>> >> > > >> >> > > > > > > > -Jay >> > > >>>> >> > > >> >> > > > > > > > >> > > >>>> >> > > >> >> > > > > > > > On Sun, Sep 6, 2015 at 4:22 PM, Jay >> > Kreps < >> > > >>>> >> > > >> j...@confluent.io> >> > > >>>> >> > > >> >> > > > wrote: >> > > >>>> >> > > >> >> > > > > > > > >> > > >>>> >> > > >> >> > > > > > > > > The magic byte is used to version >> > message >> > > >>>> format >> > > >>>> >> so >> > > >>>> >> > > >> we'll >> > > >>>> >> > > >> >> > need >> > > >>>> >> > > >> >> > > to >> > > >>>> >> > > >> >> > > > > > make >> > > >>>> >> > > >> >> > > > > > > > > sure that check is in place--I >> actually >> > > >>>> don't see >> > > >>>> >> > it >> > > >>>> >> > > in >> > > >>>> >> > > >> the >> > > >>>> >> > > >> >> > > > current >> > > >>>> >> > > >> >> > > > > > > > > consumer code which I think is a bug >> we >> > > >>>> should >> > > >>>> >> fix >> > > >>>> >> > > for >> > > >>>> >> > > >> the >> > > >>>> >> > > >> >> > next >> > > >>>> >> > > >> >> > > > > > release >> > > >>>> >> > > >> >> > > > > > > > > (filed KAFKA-2523). The purpose of >> that >> > > >>>> field is >> > > >>>> >> so >> > > >>>> >> > > >> there >> > > >>>> >> > > >> >> is >> > > >>>> >> > > >> >> > a >> > > >>>> >> > > >> >> > > > > clear >> > > >>>> >> > > >> >> > > > > > > > check >> > > >>>> >> > > >> >> > > > > > > > > on the format rather than the >> scrambled >> > > >>>> scenarios >> > > >>>> >> > > Becket >> > > >>>> >> > > >> >> > > > describes. >> > > >>>> >> > > >> >> > > > > > > > > >> > > >>>> >> > > >> >> > > > > > > > > Also, Becket, I don't think just >> fixing >> > > >>>> the java >> > > >>>> >> > > client >> > > >>>> >> > > >> is >> > > >>>> >> > > >> >> > > > > sufficient >> > > >>>> >> > > >> >> > > > > > > as >> > > >>>> >> > > >> >> > > > > > > > > that would break other clients--i.e. >> if >> > > >>>> anyone >> > > >>>> >> > > writes a >> > > >>>> >> > > >> v1 >> > > >>>> >> > > >> >> > > > > messages, >> > > >>>> >> > > >> >> > > > > > > even >> > > >>>> >> > > >> >> > > > > > > > > by accident, any non-v1-capable >> > consumer >> > > >>>> will >> > > >>>> >> > break. >> > > >>>> >> > > I >> > > >>>> >> > > >> >> think >> > > >>>> >> > > >> >> > we >> > > >>>> >> > > >> >> > > > > > > probably >> > > >>>> >> > > >> >> > > > > > > > > need a way to have the server ensure >> a >> > > >>>> particular >> > > >>>> >> > > >> message >> > > >>>> >> > > >> >> > > format >> > > >>>> >> > > >> >> > > > > > either >> > > >>>> >> > > >> >> > > > > > > > at >> > > >>>> >> > > >> >> > > > > > > > > read or write time. >> > > >>>> >> > > >> >> > > > > > > > > >> > > >>>> >> > > >> >> > > > > > > > > -Jay >> > > >>>> >> > > >> >> > > > > > > > > >> > > >>>> >> > > >> >> > > > > > > > > On Thu, Sep 3, 2015 at 3:47 PM, >> > Jiangjie >> > > >>>> Qin >> > > >>>> >> > > >> >> > > > > > <j...@linkedin.com.invalid >> > > >>>> >> > > >> >> > > > > > > > >> > > >>>> >> > > >> >> > > > > > > > > wrote: >> > > >>>> >> > > >> >> > > > > > > > > >> > > >>>> >> > > >> >> > > > > > > > >> Hi Guozhang, >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >>>> >> > > >> >> > > > > > > > >> I checked the code again. Actually >> CRC >> > > >>>> check >> > > >>>> >> > > probably >> > > >>>> >> > > >> >> won't >> > > >>>> >> > > >> >> > > > fail. >> > > >>>> >> > > >> >> > > > > > The >> > > >>>> >> > > >> >> > > > > > > > >> newly >> > > >>>> >> > > >> >> > > > > > > > >> added timestamp field might be >> treated >> > > as >> > > >>>> >> > keyLength >> > > >>>> >> > > >> >> instead, >> > > >>>> >> > > >> >> > > so >> > > >>>> >> > > >> >> > > > we >> > > >>>> >> > > >> >> > > > > > are >> > > >>>> >> > > >> >> > > > > > > > >> likely to receive an >> > > >>>> IllegalArgumentException >> > > >>>> >> when >> > > >>>> >> > > try >> > > >>>> >> > > >> to >> > > >>>> >> > > >> >> > read >> > > >>>> >> > > >> >> > > > the >> > > >>>> >> > > >> >> > > > > > > key. >> > > >>>> >> > > >> >> > > > > > > > >> I'll update the KIP. >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >>>> >> > > >> >> > > > > > > > >> Thanks, >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >>>> >> > > >> >> > > > > > > > >> Jiangjie (Becket) Qin >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >>>> >> > > >> >> > > > > > > > >> On Thu, Sep 3, 2015 at 12:48 PM, >> > > Jiangjie >> > > >>>> Qin < >> > > >>>> >> > > >> >> > > > j...@linkedin.com> >> > > >>>> >> > > >> >> > > > > > > > wrote: >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >>>> >> > > >> >> > > > > > > > >> > Hi, Guozhang, >> > > >>>> >> > > >> >> > > > > > > > >> > >> > > >>>> >> > > >> >> > > > > > > > >> > Thanks for reading the KIP. By >> "old >> > > >>>> >> consumer", I >> > > >>>> >> > > >> meant >> > > >>>> >> > > >> >> the >> > > >>>> >> > > >> >> > > > > > > > >> > ZookeeperConsumerConnector in >> trunk >> > > >>>> now, i.e. >> > > >>>> >> > > without >> > > >>>> >> > > >> >> this >> > > >>>> >> > > >> >> > > bug >> > > >>>> >> > > >> >> > > > > > > fixed. >> > > >>>> >> > > >> >> > > > > > > > >> If we >> > > >>>> >> > > >> >> > > > > > > > >> > fix the ZookeeperConsumerConnector >> > > then >> > > >>>> it >> > > >>>> >> will >> > > >>>> >> > > throw >> > > >>>> >> > > >> >> > > > exception >> > > >>>> >> > > >> >> > > > > > > > >> complaining >> > > >>>> >> > > >> >> > > > > > > > >> > about the unsupported version when >> > it >> > > >>>> sees >> > > >>>> >> > message >> > > >>>> >> > > >> >> format >> > > >>>> >> > > >> >> > > V1. >> > > >>>> >> > > >> >> > > > > > What I >> > > >>>> >> > > >> >> > > > > > > > was >> > > >>>> >> > > >> >> > > > > > > > >> > trying to say is that if we have >> > some >> > > >>>> >> > > >> >> > > > ZookeeperConsumerConnector >> > > >>>> >> > > >> >> > > > > > > > running >> > > >>>> >> > > >> >> > > > > > > > >> > without the fix, the consumer will >> > > >>>> complain >> > > >>>> >> > about >> > > >>>> >> > > CRC >> > > >>>> >> > > >> >> > > mismatch >> > > >>>> >> > > >> >> > > > > > > instead >> > > >>>> >> > > >> >> > > > > > > > >> of >> > > >>>> >> > > >> >> > > > > > > > >> > unsupported version. >> > > >>>> >> > > >> >> > > > > > > > >> > >> > > >>>> >> > > >> >> > > > > > > > >> > Thanks, >> > > >>>> >> > > >> >> > > > > > > > >> > >> > > >>>> >> > > >> >> > > > > > > > >> > Jiangjie (Becket) Qin >> > > >>>> >> > > >> >> > > > > > > > >> > >> > > >>>> >> > > >> >> > > > > > > > >> > On Thu, Sep 3, 2015 at 12:15 PM, >> > > >>>> Guozhang >> > > >>>> >> Wang < >> > > >>>> >> > > >> >> > > > > > wangg...@gmail.com> >> > > >>>> >> > > >> >> > > > > > > > >> wrote: >> > > >>>> >> > > >> >> > > > > > > > >> > >> > > >>>> >> > > >> >> > > > > > > > >> >> Thanks for the write-up Jiangjie. >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> One comment about migration plan: >> > > "For >> > > >>>> old >> > > >>>> >> > > >> consumers, >> > > >>>> >> > > >> >> if >> > > >>>> >> > > >> >> > > they >> > > >>>> >> > > >> >> > > > > see >> > > >>>> >> > > >> >> > > > > > > the >> > > >>>> >> > > >> >> > > > > > > > >> new >> > > >>>> >> > > >> >> > > > > > > > >> >> protocol the CRC check will >> fail".. >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> Do you mean this bug in the old >> > > >>>> consumer >> > > >>>> >> cannot >> > > >>>> >> > > be >> > > >>>> >> > > >> >> fixed >> > > >>>> >> > > >> >> > > in a >> > > >>>> >> > > >> >> > > > > > > > >> >> backward-compatible way? >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> Guozhang >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> On Thu, Sep 3, 2015 at 8:35 AM, >> > > >>>> Jiangjie Qin >> > > >>>> >> > > >> >> > > > > > > > <j...@linkedin.com.invalid >> > > >>>> >> > > >> >> > > > > > > > >> > >> > > >>>> >> > > >> >> > > > > > > > >> >> wrote: >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > Hi, >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> > > >>>> >> > > >> >> > > > > > > > >> >> > We just created KIP-31 to >> > propose a >> > > >>>> message >> > > >>>> >> > > format >> > > >>>> >> > > >> >> > change >> > > >>>> >> > > >> >> > > > in >> > > >>>> >> > > >> >> > > > > > > Kafka. >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >>>> >> > > >> >> > > > > > > > >> > > >>>> >> > > >> >> > > > > > > >> > > >>>> >> > > >> >> > > > > > >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > >> > > >>>> >> > > >> >> >> > > >>>> >> > > >> >> > > >>>> >> > > >> > > >>>> >> > >> > > >>>> >> >> > > >>>> >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-31+-+Message+format+change+proposal >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> > > >>>> >> > > >> >> > > > > > > > >> >> > As a summary, the motivations >> > are: >> > > >>>> >> > > >> >> > > > > > > > >> >> > 1. Avoid server side message >> > > >>>> re-compression >> > > >>>> >> > > >> >> > > > > > > > >> >> > 2. Honor time-based log roll >> and >> > > >>>> retention >> > > >>>> >> > > >> >> > > > > > > > >> >> > 3. Enable offset search by >> > > timestamp >> > > >>>> at a >> > > >>>> >> > finer >> > > >>>> >> > > >> >> > > > granularity. >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> > > >>>> >> > > >> >> > > > > > > > >> >> > Feedback and comments are >> > welcome! >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> > > >>>> >> > > >> >> > > > > > > > >> >> > Thanks, >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> > > >>>> >> > > >> >> > > > > > > > >> >> > Jiangjie (Becket) Qin >> > > >>>> >> > > >> >> > > > > > > > >> >> > >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> -- >> > > >>>> >> > > >> >> > > > > > > > >> >> -- Guozhang >> > > >>>> >> > > >> >> > > > > > > > >> >> >> > > >>>> >> > > >> >> > > > > > > > >> > >> > > >>>> >> > > >> >> > > > > > > > >> > >> > > >>>> >> > > >> >> > > > > > > > >> >> > > >>>> >> > > >> >> > > > > > > > > >> > > >>>> >> > > >> >> > > > > > > > > >> > > >>>> >> > > >> >> > > > > > > > >> > > >>>> >> > > >> >> > > > > > > >> > > >>>> >> > > >> >> > > > > > > >> > > >>>> >> > > >> >> > > > > > > >> > > >>>> >> > > >> >> > > > > > > -- >> > > >>>> >> > > >> >> > > > > > > Thanks, >> > > >>>> >> > > >> >> > > > > > > Ewen >> > > >>>> >> > > >> >> > > > > > > >> > > >>>> >> > > >> >> > > > > > >> > > >>>> >> > > >> >> > > > > >> > > >>>> >> > > >> >> > > > >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > > -- >> > > >>>> >> > > >> >> > > Thanks, >> > > >>>> >> > > >> >> > > Neha >> > > >>>> >> > > >> >> > > >> > > >>>> >> > > >> >> > >> > > >>>> >> > > >> >> >> > > >>>> >> > > >> >> > > >>>> >> > > >> > > >>>> >> > >> > > >>>> >> >> > > >>>> >> >> > > >>>> >> >> > > >>>> >> -- >> > > >>>> >> Thanks, >> > > >>>> >> Ewen >> > > >>>> >> >> > > >>>> >> > > >>>> >> > > >>> >> > > >> >> > > > >> > > >> > >>