agoncharuk commented on PR #1923:
URL: https://github.com/apache/fluss/pull/1923#issuecomment-3508479689

   Hi @kaori-seasons, we are very interested in this feature so thank you for 
driving this effort!
   
   I am curious about the way you are going to support the union reads in 
Trino: I see that in this draft you were going to implement a custom page 
source that will delegate reads to the corresponding lakehouse. However, I 
think that it will require quite a lot of effort, plus this will likely require 
re-implementing a lot of other features (like dynamic filters, partition 
pruning, etc) that are already present in existing lakehouse connectors. I was 
thinking to discuss a different approach. Let's say if Trino supported the 
union read on the API level, something along the following idea:
   1) The `Metadata` interface would have another method `splitForUnionRead` 
which would return a Trino table path + table version (snapshot ID) that should 
be used for the union read
   2) When this result is returned to Trino planner, it would replace the table 
scan with a `Union` node, which has the fluss table scan as one leg, and 
lakehouse table scan as another leg. A plugin cannot instantiate a TableHandle 
of another plugin directly, but this can be implemented inside a Trino planner 
rule
   3) Trino will use all existing optimizations for the corresponding 
connector: filters pushdown, partition pruning, etc and it will handle the 
interaction with data lakes - no other steps are required from the Fluss 
connector
   
   I think this change may benefit other connectors that try to offload storage 
to different sources as well (I can imagine building e.g. a postgres CDC to a 
lakehouse)
   
   Do you think this is a viable approach and it is worth discsussing it with 
Trino community?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to