Thanks for the update Zelin. Currently, the intermediate records from Kafka source are string type. But for debezium-avro, the intermediate records are avro objects, This is indeed the case for nested avro records containing arrays, maps, avro records etc. There is already a TODO comment here <https://github.com/apache/incubator-paimon/blob/master/paimon-flink/paimon-flink-cdc/src/main/java/org/apache/paimon/flink/sink/cdc/CdcRecordUtils.java#L102> that mentions that we need to either extend TypeUtils to handle such types or change CdcRecord.fields Map to not have String as values. My branch in [2] took the former approach. Ofc I also needed to change the DebeziumAvroParser to handle such types (rather than convert them to String).
I will continue on Debezium-avro format in 0.8.0 Thanks for working on this. I am fine with the debezium avro being available in 0.8. One thing that would be nice is if you can rebase branch [1] on master, then I can continue working off it in the meanwhile as the current branch [2] is based on [1] and it's quite diverted from master. Thanks, Umesh On Sun, Jan 21, 2024 at 8:43 PM yu zelin <[email protected]> wrote: > Hi Umesh, > > Recently I’m working on support Confluent debezium avro format > in Kafka cdc based on [1]. But the Paimon community is planning > to cut 0.7.0 release branch at Jan. 25th. And I think there is not enough > time for me to complete the job before the deadline for some reasons: > > 1. I have to modify the current CDC framework. Currently, the intermediate > records from Kafka source are string type. But for debezium-avro, the > intermediate records are avro objects, so we have to adjust the framework. > It needs some time. > > 2. I noticed that you want to support some complex type in [2] which > made some changes to TypeUtils. Since this util is used by many > features, we should do some tests to see if the changes are compatible > with other features. I think if we implement a simple version in this > release > which doesn’t support those complex types, this release cannot meet > your situation. So I suggest that you continue to use the jar buit by > yourself. > > Recently I’m also woking on preparing to release 0.7.0. I will continue on > Debezium-avro format in 0.8.0. If you have any problems with [1], welcome > to discuss with us in mailing list. > > Best, > Zelin Yu > > [1] https://github.com/apache/incubator-paimon/pull/2070 > [2] https://github.com/harveyyue/incubator-paimon/pull/1 > > 2024年1月10日 01:21,umesh dangat <[email protected]> 写道: > > Hello, > > I am a software engineer at Yelp Inc and lead the data infrastructure > group at Yelp. We have a complex real time streaming ecosystem comprising > flink, kafka and our custom schema registry service. I am trying to > evaluate Apache Paimon as a potential replacement for a lot of our data > pipelines, involving streaming reads, joins and aggregations to help > minimize our growing operational complexity and cost. Also paimon seems to > solve the schema evolution problem better than flink sqlclient? (which we > use currently) > > One issue with integrating paimon in our ecosystem seems to be that it does > not support debezium avro format. Although Jingsong Li pointed me to this > <https://github.com/apache/incubator-paimon/pull/2070> branch that does > seem to add support for debezium avro format using confluent schema > registry. This would allow us to ingest our data from kafka into paimon and > then evaluate it. > > I wanted to know if we have plans to push this branch to master soonish. I > can help with reviewing, since I plan to consume data written using this > format for some of our production workflows. > > Thanks, > Umesh > > >
