Hi Alieh,

Thanks for the KIP! I have a few follow-up thoughts on the existing
points raised by others, as well as a few new questions regarding the
implementation details.

Regarding MJS1 and chia_0: I agree with Matthias and Chia-Ping here.
Introducing a 2-byte length prefix artificially limits the header size
to 64KB and seems redundant if we are already using a varint for the
header count.

Regarding chia_1 (Migration path): I also believe this point needs
further explanation; I also did not understand how the lazy migration
will work.

LB1) First, could you clarify the interaction with the changelog
topic? The KIP states the design "does not alter the public changelog
record format." Does this mean the headers will be preserved in the
changelog topic as native Kafka record headers, or will they be
embedded inside the changelog record's value payload (matching the
RocksDB format)? If they are embedded in the value, this would
effectively change the data format for any external consumers reading
that changelog.

LB2) Second, have you considered the performance implications for
Iterators? When iterating over a TimestampedKeyValueStoreWithHeaders,
will the implementation eagerly deserialize the headers for every
record, or will this be done lazily? If users are scanning large
ranges but only need the values, eager header deserialization could
introduce unnecessary overhead.

LB3) Third, regarding the serialization format design, have you
considered adding a specific "Magic Byte" or "Version Byte" at the
very beginning of the payload (e.g., [MagicByte] [HeaderCount]
[Headers] [Value])? This would provide a safe way to handle the lazy
migration mentioned in chia_1, allowing the get() method to
definitively check if a record is in the new format or the legacy
format before attempting to parse it.

Thanks again for the work on this.

Best, Lucas

Reply via email to