[ 
https://issues.apache.org/jira/browse/KAFKA-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373755#comment-16373755
 ] 

Ewen Cheslack-Postava commented on KAFKA-6002:
----------------------------------------------

[~edvard.poliakov] Not sure if you made progress on this. The schema support in 
JsonConverter doesn't use json-schema.org style, partly because it is quite 
complicated. To be honest, the inline schema support with JSON is really more 
for demonstrative purposes – baking in and supporting a ton of formats adds a 
lot of overhead to the project, so we wanted to stick to shipping just one 
format with the framework and leave the rest to be community supported. 
However, that meant we needed to include something that could have both schema 
and schemaless modes in order to demonstrate both modes and ensure everything 
works with both modes. We ended up doing this with JSON and an ad hoc schema 
format. But generally when using a schema, you want something that doesn't need 
to ship the full schema inline with the message because that's quite 
heavyweight – often times the schema ends up larger than the message data 
itself!

For a complete JSON w/ schemas solution, I would probably suggest implementing 
Converters that look a lot like what Confluent has for Avro and using 
json-schema.org to express the schemas. The one difference is that now that we 
have headers, I'd put the schema ID information into a header instead and make 
the value just the JSON payload (whereas Avro has some additional framing in 
the value itself).

For a transformation that does this you *could* just omit the schema entirely. 
That is an option in Connect. Basically this would just mean that the transform 
only works when the user/connectors expect schemaless data.

Regarding inference, you can also just do this on a per-message basis instead 
of continuously updating a schema. There is a risk that you end up with lots of 
schemas because of this (since each could be unique), but for a lot of cases 
that may not be expected. I also have an SMT that infers schemas, so does 
something similar to what you'd need here 
[https://github.com/ewencp/kafka/commit/3abb54a8062fe727ddaabc4dd5a552dd0b465a03]
 I didn't complete both modes, but the idea was to allow either inferring on a 
per-message basis *or* specifying a schema (whether the JsonConverter variant 
or json-schema.org style) and validating & add it to the record. I think 
offering those two options in your SMT would give good flexibility as well.

> Kafka Connect Transform transforming JSON string into actual object
> -------------------------------------------------------------------
>
>                 Key: KAFKA-6002
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6002
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>            Reporter: Edvard Poliakov
>            Priority: Minor
>
> My colleague and I have been working on a new Transform, that takes a JSON 
> string and transforms it into an actual object, like this:
> {code} 
> {
>   "a" : "{\"b\": 23}"
> }
> {code}
> into
> {code}
> {
>   "a" : {
>        "b" : 23
>   }
> }
> {code}
> There is no robust way of building a Schema from a JSON object itself, as it 
> can be something like an empty array or a null, that doesn't provide any info 
> on the schema of the object. So I see two options here.
> 1. For a transform to take in schema as a transform parameter. The problem I 
> found with this is that it is not clear what JSON schema specification should 
> be used for this? I assume it would be reasonable to use 
> http://json-schema.org/, but it doesn't seem that Kafka Connect supports it 
> currently, moreover reading through JsonConverter class in Kafka Connect, I 
> am not able to understand what spec does the Json Schema have that is used in 
> that class, for example {{asConnectSchema}} method on {{JsonConverter}}, [see 
> here|https://github.com/apache/kafka/blob/trunk/connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java#L415].
> 2. On each object received, keep updating the schema, but I can't see a 
> standard and robust way of handling edge cases.
> I am happy to create a pull request for this transform, if we can agree on 
> something here. :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to