Hey Matthias, It looks a little confusing, but I don’t have enough expertise to judge on the configuration placement.
If you think, it is fine I’ll go ahead with this approach. Best, Daniyar Yeralin > On Jul 19, 2019, at 5:49 PM, Matthias J. Sax <matth...@confluent.io> wrote: > > Good point. > > I guess the simplest solution is, to actually add > >>> default.list.key/value.serde.type >>> default.list.key/value.serde.inner > > to both `CommonClientConfigs` and `StreamsConfig`. > > It's not super clean, but I think it's the best we can do. Thoughts? > > > -Matthias > > On 7/19/19 1:23 PM, Development wrote: >> Hi Matthias, >> >> I agree, ConsumerConfig did not seem like a right place for these >> configurations. >> I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig. >> >> However, I have a question. What should I do in "configure(Map<String, ?> >> configs, boolean isKey)” methods? Which configurations should I try to >> locate? I was comparing my (de)serializer implementations with >> SessionWindows(De)serializer classes, and they use StreamsConfig class to >> get either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS : >> StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS >> >> In my case, as I mentioned earlier, StreamsConfig class is not accessible >> from org.apache.kafka.common.serialization package. So, I can’t utilize it. >> Any suggestions here? >> >> Best, >> Daniyar Yeralin >> >> >>> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <matth...@confluent.io> wrote: >>> >>> Thanks! >>> >>> One minor question about the configs. The KIP adds three classes, a >>> Serializer, a Deserializer, and a Serde. >>> >>> Hence, would it make sense to add the corresponding configs to >>> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using slightly >>> different names each time? >>> >>> >>> Somethin like this: >>> >>> ProducerConfig: >>> >>> list.key/value.serializer.type >>> list.key/value.serializer.inner >>> >>> ConsumerConfig: >>> >>> list.key/value.deserializer.type >>> list.key/value.deserializer.inner >>> >>> StreamsConfig: >>> >>> default.list.key/value.serde.type >>> default.list.key/value.serde.inner >>> >>> >>> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound right >>> to me. Also note, that it seems better to avoid the `default.` prefix >>> for consumers and producers because there is only one Serializer or >>> Deserializer anyway. Only for Streams, there are multiple and >>> StreamsConfig specifies the default one of an operator does not >>> overwrite it. >>> >>> Thoughts? >>> >>> >>> Also, the KIP should explicitly mention to what classed certain configs >>> are added. Atm, the KIP only list parameter names, but does not state >>> where those are added. >>> >>> >>> -Matthias >>> >>> >>> >>> >>> >>> On 7/16/19 1:11 PM, Development wrote: >>>> Hi, >>>> >>>> Yes, totally forgot about the statement. KIP-466 is updated. >>>> >>>> Thank you so much John Roesler, Matthias J. Sax, Sophie Blee-Goldman for >>>> your valuable input! >>>> >>>> I hope I did not cause too much trouble :) >>>> >>>> I’ll start the vote now. >>>> >>>> Best, >>>> Daniyar Yeralin >>>> >>>>> On Jul 16, 2019, at 3:17 PM, John Roesler <j...@confluent.io> wrote: >>>>> >>>>> Hi Daniyar, >>>>> >>>>> Thanks for that update. I took a look, and I think this is in good shape. >>>>> >>>>> One note, the statement "New method public static <T> Serde<List<T>> >>>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class >>>>> (infers list implementation and inner serde from config file)" is >>>>> still present in the KIP, although I do it is was removed from the PR. >>>>> >>>>> Once you remove that statement from the KIP, then I think this KIP is >>>>> ready to go up for a vote! Then, we can really review the PR in >>>>> earnest and get this thing merged. >>>>> >>>>> Thanks, >>>>> -john >>>>> >>>>> On Tue, Jul 16, 2019 at 2:05 PM Development <d...@yeralin.net> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Pushed new changes under my PR: >>>>>> https://github.com/apache/kafka/pull/6592 >>>>>> <https://github.com/apache/kafka/pull/6592> >>>>>> >>>>>> Feel free to put any comments in there. >>>>>> >>>>>> Best, >>>>>> Daniyar Yeralin >>>>>> >>>>>>> On Jul 15, 2019, at 1:06 PM, Development <d...@yeralin.net> wrote: >>>>>>> >>>>>>> Hi John, >>>>>>> >>>>>>> I knew I was missing something. Yes, that makes sense now, I removed >>>>>>> all `listSerde()` methods, and left empty constructors instead. >>>>>>> >>>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t have any >>>>>>> properties related to serdes, and that bothers me a little. >>>>>>> >>>>>>> All properties like `default.key.serde` `default.windowed.key.serde.*` >>>>>>> are located in StreamsConfig. I don’t want to create a confusion. >>>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and its >>>>>>> (de)serializers are not located in >>>>>>> org.apache.kafka.common.serialization. I guess it kind of makes sense >>>>>>> since windowed serdes are only available for kafka streams, not vice >>>>>>> versa. >>>>>>> >>>>>>> If everyone is okay to put list properties in `CommonClientConfigs` >>>>>>> class, I’ll go ahead and do that then. >>>>>>> >>>>>>> Thank you for your input! >>>>>>> >>>>>>> Best, >>>>>>> Daniyar Yeralin >>>>>>> >>>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <j...@confluent.io> wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> Regarding the placement, you might as well move the constants to >>>>>>>> `org.apache.kafka.clients.CommonClientConfigs`, so that the constants >>>>>>>> and the configs and the code are in the same module. >>>>>>>> >>>>>>>> Regarding the constructor... What Matthias said is correct: The serde, >>>>>>>> serializer, and deserializer all need to have zero-arg constructors so >>>>>>>> they can be instantiated reflectively by Kafka. However, the factory >>>>>>>> method you proposed "New method public static <T> Serde<List<T>> >>>>>>>> ListSerde()" is not a constructor, and is not required. It would be >>>>>>>> used purely from the Java interface, but has the drawbacks I listed >>>>>>>> above. This method, not the constructor, is what I proposed to remove >>>>>>>> from the KIP. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> -John >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <d...@yeralin.net >>>>>>>> <mailto:d...@yeralin.net>> wrote: >>>>>>>> One problem though. >>>>>>>> >>>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m >>>>>>>> trying to mimic the implementation of my ListSerde accordingly. >>>>>>>> >>>>>>>> I created couple constants under StreamsConfig: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> And trying to do similar construct: >>>>>>>> final String propertyName = isKey ? >>>>>>>> StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : >>>>>>>> StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS; >>>>>>>> But then found out that StreamsConfig is not accessible from >>>>>>>> org.apache.kafka.common.serialization package while window serde >>>>>>>> (de)serializers are located under org.apache.kafka.streams.kstream >>>>>>>> package. >>>>>>>> >>>>>>>> What should I do? Should I move my classes under >>>>>>>> org.apache.kafka.streams.kstream package instead? >>>>>>>> >>>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <d...@yeralin.net >>>>>>>>> <mailto:d...@yeralin.net>> wrote: >>>>>>>>> >>>>>>>>> Hi Matthias, >>>>>>>>> >>>>>>>>> Thank you for your input. >>>>>>>>> >>>>>>>>> I updated the KIP, made it a little more readable. >>>>>>>>> >>>>>>>>> I think the configuration parameters strategy is finalized then. >>>>>>>>> >>>>>>>>> Do you have any other questions/concerns regarding this KIP? >>>>>>>>> >>>>>>>>> Meanwhile I’ll start doing appropriate code changes, and commit them >>>>>>>>> under my PR. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Daniyar Yeralin >>>>>>>>> >>>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <matth...@confluent.io >>>>>>>>>> <mailto:matth...@confluent.io>> wrote: >>>>>>>>>> >>>>>>>>>> Daniyar, >>>>>>>>>> >>>>>>>>>> thanks for the update to the KIP. It's in really good shape and well >>>>>>>>>> written. >>>>>>>>>> >>>>>>>>>> About the default constructor question: >>>>>>>>>> >>>>>>>>>> All Serdes/Serializer/Deserializer classes need a default >>>>>>>>>> constructor to >>>>>>>>>> create them easily via reflections when specifies in a config. I >>>>>>>>>> understand that it is not super user friendly, but all existing code >>>>>>>>>> works this way. Hence, it seems best to stick with the established >>>>>>>>>> pattern. >>>>>>>>>> >>>>>>>>>> We have a similar issue with `TimeWindowedSerde` and >>>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to improve user >>>>>>>>>> experience that address the exact issue John raised. (cf >>>>>>>>>> https://github.com/apache/kafka/pull/7067 >>>>>>>>>> <https://github.com/apache/kafka/pull/7067>) >>>>>>>>>> >>>>>>>>>> Note, that if a user would instantiate the Serde manually, the user >>>>>>>>>> would also need to call `configure()` to setup the inner serdes. >>>>>>>>>> Kafka >>>>>>>>>> Streams would not setup those automatically and one might most likely >>>>>>>>>> end-up with an NPE. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Coming back the KIP, and the parameter names. `WindowedSerdes` are >>>>>>>>>> similar to `ListSerde` as they wrap another Serde. For >>>>>>>>>> `WindowedSerdes`, >>>>>>>>>> we use the following parameter names: >>>>>>>>>> >>>>>>>>>> - default.windowed.key.serde.inner >>>>>>>>>> - default.windowed.value.serde.inner >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It might be good to align the naming pattern. I would also suggest to >>>>>>>>>> use `type` instead of `impl`? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> default.key.list.serde.impl -> default.list.key.serde.type >>>>>>>>>> default.value.list.serde.impl -> default.list.value.serde.type >>>>>>>>>> default.key.list.serde.element -> default.list.key.serde.inner >>>>>>>>>> default.value.list.serde.element -> default.list.value.serde.inner >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -Matthias >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 7/10/19 8:52 AM, Development wrote: >>>>>>>>>>> Hi John, >>>>>>>>>>> >>>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is that >>>>>>>>>>> it goes against what Matthias suggested earlier: >>>>>>>>>>> "I think that ... `ListSerde` should have an default constructor >>>>>>>>>>> and it should be possible to pass in the `Class listClass` >>>>>>>>>>> information via a configuration. Otherwise, KafkaStreams cannot use >>>>>>>>>>> it as default serde.” >>>>>>>>>>> >>>>>>>>>>> What do you think about that? I hope I’m not confusing anything. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>> >>>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <j...@confluent.io >>>>>>>>>>>> <mailto:j...@confluent.io>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for the >>>>>>>>>>>> update, too. >>>>>>>>>>>> >>>>>>>>>>>> Just one more super-small question, do we need this variant: >>>>>>>>>>>> >>>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in >>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes class (infers list >>>>>>>>>>>>> implementation and inner serde from config file) >>>>>>>>>>>> >>>>>>>>>>>> It seems like this situation implies my config file is already set >>>>>>>>>>>> up for the list serde, so passing this serde (e.g., in Produced) >>>>>>>>>>>> would have the same effect as not specifying it. >>>>>>>>>>>> >>>>>>>>>>>> I guess that it could be the case that you have the >>>>>>>>>>>> `default.key/value.serde` set to something else, like StringSerde, >>>>>>>>>>>> but you still have the `default.key/value.list.serde.impl/element` >>>>>>>>>>>> set. This seems like it would result in more confusion than >>>>>>>>>>>> convenience, so my gut instinct is maybe we shouldn't introduce >>>>>>>>>>>> the `ListSerde()` variant until people actually request it later >>>>>>>>>>>> on. >>>>>>>>>>>> >>>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully >>>>>>>>>>>> source-code-driven, not half/half. >>>>>>>>>>>> >>>>>>>>>>>> What do you think? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> -John >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <d...@yeralin.net >>>>>>>>>>>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net >>>>>>>>>>>> <mailto:d...@yeralin.net>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi John, >>>>>>>>>>>>> >>>>>>>>>>>>> I hope everyone had a great long weekend. >>>>>>>>>>>>> >>>>>>>>>>>>> Regarding Java interfaces, I may not understand you correctly, >>>>>>>>>>>>> but I think I already listed them: >>>>>>>>>>>>> >>>>>>>>>>>>> So for Produced, you would use it in the following fashion, for >>>>>>>>>>>>> example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, >>>>>>>>>>>>> Serdes.Integer())) >>>>>>>>>>>>> >>>>>>>>>>>>> I also updated the KIP, and added a section “Serialization >>>>>>>>>>>>> Strategy” where I describe our logic of conditional serialization >>>>>>>>>>>>> based on the type of an inner serde. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you! >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>> >>>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <j...@confluent.io >>>>>>>>>>>>> <mailto:j...@confluent.io> <mailto:j...@confluent.io >>>>>>>>>>>>> <mailto:j...@confluent.io>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the update, Daniyar! >>>>>>>>>>>>> >>>>>>>>>>>>> In addition to specifying the config interface, can you also >>>>>>>>>>>>> specify >>>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance of this >>>>>>>>>>>>> serde in to the DSL directly, as in Produced, Materialized, etc., >>>>>>>>>>>>> what >>>>>>>>>>>>> constructor(s) would I have available? Likewise with the >>>>>>>>>>>>> Serializer >>>>>>>>>>>>> and Deserailizer. I don't think you need to specify the >>>>>>>>>>>>> implementation >>>>>>>>>>>>> logic, since we've already discussed it here. >>>>>>>>>>>>> >>>>>>>>>>>>> If you also want to specify the serialized format of the data >>>>>>>>>>>>> records >>>>>>>>>>>>> in the KIP, it could be useful documentation, as well as letting >>>>>>>>>>>>> us >>>>>>>>>>>>> verify the schema for forward/backward compatibility concerns, >>>>>>>>>>>>> etc. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> John >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <d...@yeralin.net >>>>>>>>>>>>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net >>>>>>>>>>>>> <mailto:d...@yeralin.net>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hey, >>>>>>>>>>>>> >>>>>>>>>>>>> Finally made updates to the KIP: >>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization >>>>>>>>>>>>> >>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> >>>>>>>>>>>>> >>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization >>>>>>>>>>>>> >>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>> >>>>>>>>>>>>> >>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization >>>>>>>>>>>>> >>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization> >>>>>>>>>>>>> >>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization >>>>>>>>>>>>> >>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>> >>>>>>>>>>>>> Sorry for the delay :) >>>>>>>>>>>>> >>>>>>>>>>>>> Thank You! >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>> >>>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax >>>>>>>>>>>>> <matth...@confluent.io <mailto:matth...@confluent.io> >>>>>>>>>>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, something like this. I did not think about good configuration >>>>>>>>>>>>> parameter names yet. I am also not sure if I understand all >>>>>>>>>>>>> proposed >>>>>>>>>>>>> configs atm. But all configs should be listed and explained in >>>>>>>>>>>>> the KIP >>>>>>>>>>>>> anyway, and we can discuss further after you have updated the KIP >>>>>>>>>>>>> (I can >>>>>>>>>>>>> ask more detailed question if I have any). >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -Matthias >>>>>>>>>>>>> >>>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to have in >>>>>>>>>>>>> a list >>>>>>>>>>>>> of primitives. >>>>>>>>>>>>> >>>>>>>>>>>>> As for the default constructor and configurability, just want to >>>>>>>>>>>>> make >>>>>>>>>>>>> sure. Is this what you have on your mind? >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax >>>>>>>>>>>>> <matth...@confluent.io <mailto:matth...@confluent.io> >>>>>>>>>>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>> >>>>>>>>>>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io> >>>>>>>>>>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the update! >>>>>>>>>>>>> >>>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde` >>>>>>>>>>>>> should have an default constructor and it should be possible to >>>>>>>>>>>>> pass in >>>>>>>>>>>>> the `Class listClass` information via a configuration. Otherwise, >>>>>>>>>>>>> KafkaStreams cannot use it as default serde. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not primitive >>>>>>>>>>>>> IMHO, >>>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for arrays, >>>>>>>>>>>>> not for >>>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping >>>>>>>>>>>>> `byte[]`). >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in the KIP >>>>>>>>>>>>> discussion. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Can you also update the KIP? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -Matthias >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I made and pushed necessary commits, so we could review the final >>>>>>>>>>>>> version under PR https://github.com/apache/kafka/pull/6592 >>>>>>>>>>>>> <https://github.com/apache/kafka/pull/6592> >>>>>>>>>>>>> <https://github.com/apache/kafka/pull/6592 >>>>>>>>>>>>> <https://github.com/apache/kafka/pull/6592>> >>>>>>>>>>>>> >>>>>>>>>>>>> I also need some advice on writing tests for this new serde. So >>>>>>>>>>>>> far I >>>>>>>>>>>>> only have two test cases (roundtrip and empty payload), I’m not >>>>>>>>>>>>> sure >>>>>>>>>>>>> if it is enough. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank y’all for your help in this KIP :) >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <j...@confluent.io >>>>>>>>>>>>> <mailto:j...@confluent.io> <mailto:j...@confluent.io >>>>>>>>>>>>> <mailto:j...@confluent.io>> >>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io> >>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hey Daniyar, >>>>>>>>>>>>> >>>>>>>>>>>>> Looks good to me! Thanks for considering it. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> -John >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <d...@yeralin.net >>>>>>>>>>>>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net >>>>>>>>>>>>> <mailto:d...@yeralin.net>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>> wrote: >>>>>>>>>>>>> Hey John and Matthias, >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant information. >>>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list type. I >>>>>>>>>>>>> realized that’s the type is not really needed in ListSerializer, >>>>>>>>>>>>> but >>>>>>>>>>>>> only in ListDeserializer: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> In ListSerializer we will start storing sizes only if serializer >>>>>>>>>>>>> is >>>>>>>>>>>>> not a primitive serializer: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Then, in deserializer, we persist passed list type, so that during >>>>>>>>>>>>> deserialization we could create an instance of it with predefined >>>>>>>>>>>>> listSize for better performance. >>>>>>>>>>>>> We also try to locate a primitiveSize based on passed >>>>>>>>>>>>> deserializer. >>>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which means >>>>>>>>>>>>> that each entry’s size was encoded individually. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> This looks much cleaner and more concise. >>>>>>>>>>>>> >>>>>>>>>>>>> What do you think? >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>> >>>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax >>>>>>>>>>>>> <matth...@confluent.io <mailto:matth...@confluent.io> >>>>>>>>>>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>> >>>>>>>>>>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io> >>>>>>>>>>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>> >>>>>>>>>>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io> >>>>>>>>>>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> For encoding the list-type: I see John's point about re-encoding >>>>>>>>>>>>> the >>>>>>>>>>>>> list-type redundantly. However, I also don't like the idea that >>>>>>>>>>>>> the >>>>>>>>>>>>> Deserializer returns a fixed type... >>>>>>>>>>>>> >>>>>>>>>>>>> Maybe it's best allow users to specify the target list type on >>>>>>>>>>>>> deserialization via config? >>>>>>>>>>>>> >>>>>>>>>>>>> Similar for the primitive types: I don't think we need to encode >>>>>>>>>>>>> the >>>>>>>>>>>>> type size, but users could specify the type on the deserializer >>>>>>>>>>>>> (via a >>>>>>>>>>>>> config again)? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt >>>>>>>>>>>>> we can >>>>>>>>>>>>> support this and a cast will be necessary at some point in the >>>>>>>>>>>>> user >>>>>>>>>>>>> code. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -Matthias >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hey Daniyar, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for looking at it! >>>>>>>>>>>>> >>>>>>>>>>>>> Something like your screenshot is more along the lines of what I >>>>>>>>>>>>> was >>>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how would >>>>>>>>>>>>> that not >>>>>>>>>>>>> be "vanilla java"? >>>>>>>>>>>>> >>>>>>>>>>>>> Unfortunately the deserializer needs more information, though. For >>>>>>>>>>>>> example, what if the inner type is a Map<String,String>? The serde >>>>>>>>>>>>> could >>>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need >>>>>>>>>>>>> an >>>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde). >>>>>>>>>>>>> >>>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> = >>>>>>>>>>>>> Serdes.listSerde( >>>>>>>>>>>>> /**list type**/ LinkedList.class, >>>>>>>>>>>>> /**inner serde**/ new MyRecordSerde() >>>>>>>>>>>>> ) >>>>>>>>>>>>> >>>>>>>>>>>>> And in configuration, it's something like: >>>>>>>>>>>>> default.key.serde: org...ListSerde >>>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList >>>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> What do you think? >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> -John >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <d...@yeralin.net >>>>>>>>>>>>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net >>>>>>>>>>>>> <mailto:d...@yeralin.net>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hey John, >>>>>>>>>>>>> >>>>>>>>>>>>> I gave read about TypeReference. It could work for the list serde. >>>>>>>>>>>>> However, it is not directly >>>>>>>>>>>>> supported: >>>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 >>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490> >>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 >>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490>> >>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 >>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490> >>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 >>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490>>> >>>>>>>>>>>>> The only way is to pass an actual class object into the >>>>>>>>>>>>> constructor, >>>>>>>>>>>>> something like: >>>>>>>>>>>>> >>>>>>>>>>>>> It could be an option, but not a pretty one. What do you think of >>>>>>>>>>>>> my >>>>>>>>>>>>> approach to use vanilla java and canonical class name? (As >>>>>>>>>>>>> described >>>>>>>>>>>>> previously) >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>> >>>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <d...@yeralin.net >>>>>>>>>>>>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net >>>>>>>>>>>>> <mailto:d...@yeralin.net>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi John, >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit over >>>>>>>>>>>>> engineered :) >>>>>>>>>>>>> >>>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since he gave >>>>>>>>>>>>> me an idea about storing fixed/variable size entries. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>> >>>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <j...@confluent.io >>>>>>>>>>>>> <mailto:j...@confluent.io> <mailto:j...@confluent.io >>>>>>>>>>>>> <mailto:j...@confluent.io>> >>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io> >>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io>>> >>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io> >>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io>>> >>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io> >>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io>> >>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io> >>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Daniyar, >>>>>>>>>>>>> >>>>>>>>>>>>> That's a very clever solution! >>>>>>>>>>>>> >>>>>>>>>>>>> One observation is that, now, this is what we might call a >>>>>>>>>>>>> polymorphic >>>>>>>>>>>>> serde. That is, you're detecting the actual concrete type and then >>>>>>>>>>>>> promising to produce the exact same concrete type on read. >>>>>>>>>>>>> There are >>>>>>>>>>>>> some inherent problems with this approach, which in general >>>>>>>>>>>>> require >>>>>>>>>>>>> some kind of schema registry (not necessarily Schema >>>>>>>>>>>>> Registry, just >>>>>>>>>>>>> any registry for schemas) to solve. >>>>>>>>>>>>> >>>>>>>>>>>>> Notice that every serialized record has quite a bit of duplicated >>>>>>>>>>>>> information: the concrete type as well as a byte to indicate >>>>>>>>>>>>> whether >>>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to >>>>>>>>>>>>> indicate the >>>>>>>>>>>>> actual size. These constitute a schema, of sorts, because they >>>>>>>>>>>>> tell us >>>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately, this >>>>>>>>>>>>> information is completely redundant. In all likelihood, the >>>>>>>>>>>>> information will be exactly the same for every record in the >>>>>>>>>>>>> topic. >>>>>>>>>>>>> This problem is essentially the core motivation for serializations >>>>>>>>>>>>> like Avro: to move the schema outside of the serialization >>>>>>>>>>>>> itself, so >>>>>>>>>>>>> that the records won't contain so much redundant information. >>>>>>>>>>>>> >>>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back to >>>>>>>>>>>>> something >>>>>>>>>>>>> like what you had earlier in which you don't support perfectly >>>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead just >>>>>>>>>>>>> support deserializing to _some_ List. Then, you could defer full, >>>>>>>>>>>>> perfect, type preservation to serdes that have an external >>>>>>>>>>>>> system in >>>>>>>>>>>>> which to register their type information. >>>>>>>>>>>>> >>>>>>>>>>>>> There does exist an alternative, if we really do want to >>>>>>>>>>>>> preserve the >>>>>>>>>>>>> concrete type (which does seem kind of nice). You can add a >>>>>>>>>>>>> configuration option specifically for the serde to configure >>>>>>>>>>>>> what the >>>>>>>>>>>>> list type will be, and maybe what the element type is, as well. >>>>>>>>>>>>> >>>>>>>>>>>>> As far as "related work" goes, you might be interested to take >>>>>>>>>>>>> a look >>>>>>>>>>>>> at how Jackson can be configured to deserialize into a specific, >>>>>>>>>>>>> arbitrarily nested, generically parameterized class structure. >>>>>>>>>>>>> Specifically, you might find >>>>>>>>>>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>>>>>>>>>>>> >>>>>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> >>>>>>>>>>>>> >>>>>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>>>>>>>>>>>> >>>>>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>> >>>>>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>>>>>>>>>>>> >>>>>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> >>>>>>>>>>>>> >>>>>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>>>>>>>>>>>> >>>>>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>> >>>>>>>>>>>>> interesting. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> -John >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <d...@yeralin.net >>>>>>>>>>>>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net >>>>>>>>>>>>> <mailto:d...@yeralin.net>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> bump >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> >> >> >