kaori-seasons commented on PR #1923:
URL: https://github.com/apache/fluss/pull/1923#issuecomment-3509311051

   > Hi @kaori-seasons, we are very interested in this feature so thank you for 
driving this effort!
   > 
   > I am curious about the way you are going to support the union reads in 
Trino: I see that in this draft you were going to implement a custom page 
source that will delegate reads to the corresponding lakehouse. However, I 
think that it will require quite a lot of effort, plus this will likely require 
re-implementing a lot of other features (like dynamic filters, partition 
pruning, etc) that are already present in existing lakehouse connectors. I was 
thinking to discuss a different approach. Let's say if Trino supported the 
union read on the API level, something along the following idea:
   > 
   > 1. The `Metadata` interface would have another method `splitForUnionRead` 
which would return a Trino table path + table version (snapshot ID) that should 
be used for the union read
   > 2. When this result is returned to Trino planner, it would replace the 
table scan with a `Union` node, which has the fluss table scan as one leg, and 
lakehouse table scan as another leg. A plugin cannot instantiate a TableHandle 
of another plugin directly, but this can be implemented inside a Trino planner 
rule
   > 3. Trino will use all existing optimizations for the corresponding 
connector: filters pushdown, partition pruning, etc and it will handle the 
interaction with data lakes - no other steps are required from the Fluss 
connector
   > 
   > I think this change may benefit other connectors that try to offload 
storage to different sources as well (I can imagine building e.g. a postgres 
CDC to a lakehouse)
   > 
   > Do you think this is a viable approach and it is worth discsussing it with 
Trino community?
   
   Hi @agoncharuk, I'm glad you're interested in this feature. I apologize for 
the late reply, as I was away on a business trip over the weekend. In general, 
I agree with your suggestion, but the reason I implemented the dynamic filter 
and partition pruning features is to support an adaptive strategy. This allows 
me to judge data freshness based on the actual business scenario. Here's a 
compromise I'm not sure if you'd be okay with:
   
   1. **Retain the advantages of the existing implementation:** Keep the 
intelligent strategy selection and adaptive optimization features.
   
   2. **Enhance integration with the Trino planner:** Add a suggestion method 
to the Metadata interface, but still retain our optimization logic.
   
   ``java
   
   // A similar method can be added to FlussMetadata
   public interface FlussMetadata {
       // Existing methods...
   
      // Add a new method to support union reads at the Trino level
       default Optional<UnionReadInfo> splitForUnionRead(String tableName) {
          // Returns the Trino table path + table version (snapshot ID)
          // This allows the Trino planner to create Union nodes
           return Optional.empty();
       }
   }
   ```
   
   What are your thoughts on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to