fqshopify opened a new pull request, #174: URL: https://github.com/apache/flink-connector-kafka/pull/174
Fixes: https://issues.apache.org/jira/browse/FLINK-32609 Implements the `SupportsProjectPushdown` interface for `KafkaDynamicSource`. # Benefits 1. Improved performance - Unneeded columns will be filtered out at an earlier stage of processing (specficially in the `TableSourceScan` node). - The amount of improvement in performance will vary depending on: - The number of columns selected - The number of columns in the source data - The DecodingFormat - etc. 2. Improved resiliency - Currently SQL queries will fail if any field in the table undergoes a breaking schema change, even if the SQL query itself does not depend on that field. - After the changes in this PR, SQL queries will continue to work even if fields that they do not depend on experience breaking schema changes. See `testBreakingSchemaChanges` for an example. - This improvement is generally only applicable to `ProjectableDecodingFormat` that can decode messages independently/dynamically e.g. `json`, `avro-confluent`, `debezium-avro-confluent`. # Limitations 1. We cannot push projections all the way down into Kafka. - Projection pushdown is primarily used as an I/O optimization technique by pushing projections down all the way into the storage layer. - Unfortunately our storage layer, Kafka, does not support projection pushdown. As a result, we are not able to push projections down further than the deserialization step of our Flink pipelines. - We're still able to improve performance by eliminating unnecessary columns at an earlier stage of processing within our Flink pipeline but the performance benefits are relatively lower than they would have been otherwise. # Challenges 1. `AvroDecodingFormat` is not actually projectable - The `AvroDecodingFormat` claims to be a `ProjectableDecodingFormat` but is not actually projectable AFAICT. - It appears that I’m not the first person to discover this issue: FLINK-35324 - Possible remediations: - Fix the `AvroDecodingFormat` to actually be projectable - I think this is impossible based on how Avro works - Change `AvroDecodingFormat` so that it implements just `DecodingFormat` - I think this is the right solution but this is likely a breaking change - Add an optional configuration to disable pushing down projections into the decoder - This is what I've done in the prototype to unblock myself temporarily # Next steps - [ ] Get feedback from the community - [ ] Raise a FLIP if necessary - [ ] Align on a solution for `AvroDecodingFormat` issue - [ ] Clean up code, more unit tests, etc. - [ ] Open up for formal PR reviews (early reviews/comments are still welcome though!) # Reviewing notes Please note that the code included in this PR is currently in the working-prototype stage and is mostly intended to facilitate discussion. I can definitely clean up the code and I'm open to changing things. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
