EDIT: everywhere I've been using "HEX" I meant to be using "BASE64". I will
update the KIP to reflect this.

On Wed, Aug 7, 2019 at 9:44 AM Almog Gavra <al...@confluent.io> wrote:

> Thanks for the feedback Arjun! I'm happy changing the default config to
> HEX instead of BINARY, no strong feelings there.
>
> I'll also clarify the example in the KIP to be clearer:
>
> - serialize the decimal field "foo" with value "10.2345" with the HEX
> setting: {"foo": "D3J5"}
> - serialize the decimal field "foo" with value "10.2345" with the NUMERIC
> setting: {"foo": 10.2345}
>
> With regards to the precision issue, that was my original concern as well
> (and why I originally suggested a TEXT format). Many JSON deserializers
> (e.g. Jackson with DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS),
> however, have the ability to deserialize decimals correctly so I will
> configure that as the default for Connect's JsonDeserializer. It's probably
> a good idea to call out that using other deserializers must be done with
> care - I will add that documentation to the serialization config.
>
> Note that there would not be an issue on the _serialization_ side of
> things as Jackson respects BigDecimal.
>
> Almog
>
> On Tue, Aug 6, 2019 at 11:23 PM Arjun Satish <arjun.sat...@gmail.com>
> wrote:
>
>> hey Almog, nice work! couple of thoughts (hope I'm not late since you
>> started the voting thread already):
>>
>> can you please add some examples to show the changes that you are
>> proposing. makes me think that for a given decimal number, we will have
>> two
>> encodings: “asHex” and “asNumber”.
>>
>> should we call the default config value “HEX” instead of “BINARY”?
>>
>> Should we call out the fact that JS systems might be susceptible to double
>> precision round offs with the new numeric format? here are some people
>> discussing a similar problem
>> https://github.com/EventStore/EventStore/issues/1541
>>
>> On Tue, Aug 6, 2019 at 1:40 PM Almog Gavra <al...@confluent.io> wrote:
>>
>> > Hello Everyone,
>> >
>> > Summarizing an in-person discussion with Randall (this is copied from
>> the
>> > KIP):
>> >
>> > The original KIP suggested supporting an additional representation -
>> base10
>> > encoded text (e.g. `{"asText":"10.2345"}`). This causes issues because
>> it
>> > is impossible to disambiguate between TEXT and BINARY without an
>> additional
>> > config - furthermore, this makes the migration from one to the other
>> nearly
>> > impossible because it would require that all consumers stop consuming
>> and
>> > producers stop producing and atomically updating the config on all of
>> them
>> > after deploying the new code, or waiting for the full retention period
>> to
>> > pass - neither option is viable. The suggestion in the KIP is strictly
>> an
>> > improvement over the existing behavior, even if it doesn't support all
>> > combinations.
>> >
>> > It seems that since most real-world use cases actually use the numeric
>> > representation (not string) we can consider this an improvement. With
>> the
>> > new suggestion, we don't need a deserialization configuration (only a
>> > serialization option) and the consumers will be able to always
>> > automatically determine the serialization format.
>> >
>> > Based on this, I'll be opening up the simplified version of the KIP to a
>> > vote.
>> >
>> > Almog
>> >
>> > On Mon, Jul 29, 2019 at 9:29 AM Almog Gavra <al...@confluent.io> wrote:
>> >
>> > > I'm mostly happy with your current suggestion (two configs, one for
>> > > serialization and one for deserialization) and your implementation
>> > > suggestion. One thing to note:
>> > >
>> > > > We should _always_ be able to deserialize a standard JSON
>> > > > number as a decimal
>> > >
>> > > I was doing some research into decimals and JSON, and I can imagine a
>> > > compelling reason to require string representations to avoid losing
>> > > precision and to be certain that whomever is sending the data isn't
>> > losing
>> > > precision (e.g. https://stackoverflow.com/a/38357877/2258040).
>> > >
>> > > I'm okay with always allowing numerics, but thought it's worth raising
>> > the
>> > > thought.
>> > >
>> > > On Mon, Jul 29, 2019 at 4:57 AM Andy Coates <a...@confluent.io>
>> wrote:
>> > >
>> > >> The way I see it, we need to control two seperate things:
>> > >>
>> > >> 1. How do we _deserialize_ a decimal type if we encounter a text
>> node in
>> > >> the JSON?    (We should _always_ be able to deserialize a standard
>> JSON
>> > >> number as a decimal).
>> > >> 2. How do we chose how we want decimals to be _serialized_.
>> > >>
>> > >> This looks to fits well with your second suggestion of slightly
>> > different
>> > >> configs names for serialization vs deserialization.
>> > >> a, For deserialization we only care about how to handle text nodes: `
>> > >> deserialization.decimal.*text*.format`, which should only have two
>> valid
>> > >> values BINARY | TEXT.
>> > >> b. For serialization we need all three:
>> `serialization.decimal.format`,
>> > >> which should support all three options: BINARY | TEXT | NUMERIC.
>> > >>
>> > >> Implementation wise, I think these should be two separate enums,
>> rather
>> > >> than one shared enum and throwing an error if the deserializer is
>> set to
>> > >> NUMERIC.  Mainly as this means the enums reflect the options
>> available,
>> > >> rather than this being hidden in config checking code.  But that's a
>> > minor
>> > >> implementation detail.
>> > >>
>> > >> Personally, I'd be tempted to have the BINARY value named something
>> like
>> > >> `LEGACY` or `LEGACY_BINARY` as a way of encouraging users to move
>> away
>> > >> from
>> > >> it.
>> > >>
>> > >> It's a real shame that both of these settings require a default of
>> > BINARY
>> > >> for backwards compatibility, but I agree that discussions / plans
>> around
>> > >> switching the defaults should not block this KIP.
>> > >>
>> > >> Andy
>> > >>
>> > >>
>> > >> On Thu, 25 Jul 2019 at 18:26, Almog Gavra <al...@confluent.io>
>> wrote:
>> > >>
>> > >> > Thanks for the replies Andy and Andrew (2x Andy?)!
>> > >> >
>> > >> > > Is the text decimal a base16 encoded number, or is it base16
>> encoded
>> > >> > binary
>> > >> > > form of the number?
>> > >> >
>> > >> > The conversion happens as decimal.unscaledValue().toByteArray() and
>> > then
>> > >> > the byte array is converted to a hex string, so it's definitely the
>> > >> binary
>> > >> > form of the number converted to base16. Whether or not that's the
>> same
>> > >> as
>> > >> > the base16 encoded number is a good question (toByteArray returns a
>> > byte
>> > >> > array containing a signed, big-endian, two's complement
>> representation
>> > >> of
>> > >> > the big integer).
>> > >> >
>> > >> > > One suggestion I have is to change the proposed new config to
>> only
>> > >> affect
>> > >> > > decimals stored as text, i.e. to switch between the current
>> base16
>> > and
>> > >> > the
>> > >> > > more common base10.   Then add another config to the serializer
>> only
>> > >> that
>> > >> > > controls if decimals should be serialized as text or numeric.
>> > >> >
>> > >> > I think we need to be able to handle all mappings from
>> serialization
>> > >> format
>> > >> > to deserialization format (e.g. read in BINARY and output TEXT),
>> > which I
>> > >> > think would be impossible with the alternative suggestion. I agree
>> > that
>> > >> > automatically deserializing numerics is valuable. I see two other
>> ways
>> > >> to
>> > >> > get this, both keeping the serialization.format config the same:
>> > >> >
>> > >> > - have json.decimal.deserialization.format accept all three
>> formats.
>> > if
>> > >> set
>> > >> > to BINARY/TEXT, numerics would be automatically supported. If set
>> to
>> > >> > NUMERIC, then any string coming in would result in deserialization
>> > error
>> > >> > (defaults to BINARY for backwards compatibility)
>> > >> > - change json.decimal.deserialization.format to
>> > >> > json.decimal.deserialization.string.format which accepts only
>> > >> BINARY/TEXT
>> > >> > (defaults to BINARY for backwards compatibility)
>> > >> >
>> > >> > > would be a breaking change in that things that previously failed
>> > would
>> > >> > > suddenly start deserializing.  This is a price I'm willing to
>> pay.
>> > >> >
>> > >> > I agree. I'm willing to pay this price too.
>> > >> >
>> > >> > > IMHO, we should then plan to switch the default of decimal
>> > >> serialization
>> > >> > to
>> > >> > > numeric, and text serialization to base 10 in the next major
>> > release.
>> > >> >
>> > >> > I think that can be a separate discussion, I don't want to block
>> this
>> > >> KIP
>> > >> > on it.
>> > >> >
>> > >> > Thoughts?
>> > >> >
>> > >> > On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto <o...@wikimedia.org>
>> > wrote:
>> > >> >
>> > >> > > This is a bit orthogonal, but in JsonSchemaConverter I use
>> > >> JSONSchemas to
>> > >> > > indicate whether a JSON number should be deserialized as an
>> integer
>> > >> or a
>> > >> > > decimal
>> > >> > > <
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261
>> > >> > > >.
>> > >> > > Not everyone is going to have JSONSchemas available when
>> converting,
>> > >> but
>> > >> > if
>> > >> > > you do, it is an easy way to support JSON numbers as decimals.
>> > >> > >
>> > >> > > Carry on! :)
>> > >> > >
>> > >> > > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates <a...@confluent.io>
>> > >> wrote:
>> > >> > >
>> > >> > > > Hi Almog,
>> > >> > > >
>> > >> > > > Like the KIP - I think being able to support decimals in JSON
>> in
>> > the
>> > >> > same
>> > >> > > > way most other systems do is a great improvement.
>> > >> > > >
>> > >> > > > It's not 100% clear to me from the KIP what the current format
>> is.
>> > >> Is
>> > >> > > the
>> > >> > > > text decimal a base16 encoded number, or is it base16 encoded
>> > binary
>> > >> > form
>> > >> > > > of the number? (I've not tried to get my head around if these
>> two
>> > >> are
>> > >> > > even
>> > >> > > > different!)
>> > >> > > >
>> > >> > > > One suggestion I have is to change the proposed new config to
>> only
>> > >> > affect
>> > >> > > > decimals stored as text, i.e. to switch between the current
>> base16
>> > >> and
>> > >> > > the
>> > >> > > > more common base10.   Then add another config to the serialzier
>> > only
>> > >> > that
>> > >> > > > controls if decimals should be serialized as text or numeric.
>> The
>> > >> > > benefit
>> > >> > > > of this approach is it allows us to enhance the deserializer to
>> > >> > > > automatically handle numeric decimals even without any config
>> > >> having to
>> > >> > > be
>> > >> > > > set, i.e. default config in the deserializer would be able to
>> > handle
>> > >> > > > numeric decimals.  Of course, this is a two edged sword: this
>> > would
>> > >> > make
>> > >> > > > the deserializer work out of the box with numeric decimals,
>> > (yay!),
>> > >> but
>> > >> > > > would be a breaking change in that things that previously
>> failed
>> > >> would
>> > >> > > > suddenly start deserializing.  This is a price I'm willing to
>> pay.
>> > >> > > >
>> > >> > > > IMHO, we should then plan to switch the default of decimal
>> > >> > serialization
>> > >> > > to
>> > >> > > > numeric, and text serialization to base 10 in the next major
>> > >> release.
>> > >> > > > (With upgrade notes to match). Though I know this is more
>> > >> contentious,
>> > >> > I
>> > >> > > > think it moves us forward in a much more standard way that the
>> > >> current
>> > >> > > > encoding of decimals.
>> > >> > > >
>> > >> > > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra <al...@confluent.io>
>> > >> wrote:
>> > >> > > >
>> > >> > > > > Hi Everyone!
>> > >> > > > >
>> > >> > > > > Kicking off discussion for a new KIP:
>> > >> > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON
>> > >> > > > >
>> > >> > > > > For those who are interested, I have a prototype
>> implementation
>> > >> that
>> > >> > > > helped
>> > >> > > > > guide my design: https://github.com/agavra/kafka/pull/1
>> > >> > > > >
>> > >> > > > > Cheers,
>> > >> > > > > Almog
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> >
>>
>

Reply via email to