eolivelli opened a new issue #9844:
URL: https://github.com/apache/pulsar/issues/9844


   We have the `KeyValue` schema that supports a generic key-value model, and 
both the key and the value have a schema.
   
   When you are dealing with structured data types, currently you usually use 
`Sink<GenericRecord>` and the `AUTO_CONSUME` schema, this way you can deal 
automatically with any supported from of data structures.
   
   But if you use `AUTO_CONSUME` you cannot consume `KeyValue` records.
   
   **Describe the solution you'd like**
   I would like to see a way to use `AUTO_CONSUME` that in case of `KeyValue` 
schema, it passes a special `GenericRecord` instance with two fields:
   - key
   - value
   GenericRecord already supports nested data structures, so it is possible to 
set the schema for the key field and for the value field.
   
   Advanced processors that allow to deal with nested structures will benefit 
from this new feature, because they will automatically be able to deal with 
KeyValue without changes, and in a consistent way, that is to deal only with 
GenericRecord, that is the generic key-value dictionary we have in Pulsar. 
   
   **Describe alternatives you've considered**
   Modifying all of the connectors to deal with KeyValue and with 
GenericRecord, but this will be a big effort, and also currently (2.7.x) you 
cannot have a Sink that deals with two separate data type (the user must set 
explicitly a "classname")
   
   **Additional context**
   I have implementations of Sinks that deal with generic data structures and 
allow the user to transform/map the data before writing to the external system
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to