adriangb commented on PR #18192:
URL: https://github.com/apache/datafusion/pull/18192#issuecomment-3447731663

   Yeah that makes sense! I'm not sure how to encode that into the APIs, but 
making it very explicit in the docs, etc. should be enough?
   
   One thing I'm thinking is if there's a way to satisfy all of the input here 
by making a new high level API.
   An annoyance I've had with the Codec system is that it's not clear how to 
inject extra behavior, it only functions as a fallback.
   
   Something along the lines of:
   
   ```rust
   pub trait Decoder {
      fn decode_plan(&self, plan: PhysicalPlanNode) -> Result<Arc<dyn 
ExecutionPlan>>;
      fn decode_expression(&self, expression: PhysicalExprNode) -> 
Result<Arc<dyn ExpressionNode>>;
   }
   
   pub struct DefaultDecoder {
     ctx: TaskContext,
   }
   
   impl DefaultDecoder {
     pub fn new(ctx: TaskContext) -> Self {
       Self { ctx }
     }
   
   impl Decoder for DefaultDecoder {
      fn decode_plan(decoder: & dyn Decoder, plan: PhysicalPlanNode) -> 
Result<Arc<dyn ExecutionPlan>> {
         // Essentially the code inside of 
`PhysicalPlanNode::try_from_physical_plan`
        // but passing around a reference to ourselves as `&dyn Decoder` so 
that e.g. if we have to decode
        // predicates inside of a plan it calls back into `decode_expression`
        // Maybe delegates to 
      }
   
      fn decode_expression(decoder: &dyn Decoder expression: ExpressionNode, 
input_schema: &Schema) -> Result<Arc<dyn ExpressionNode>> {
         // essentially the code inside of `parse_physical_expr` but again 
passing around a reference to ourselves
      }
   }
   ```
   
   Then it's easy to make a custom `DefaultDecoder` that injects before/after 
behavior, e.g. caching or re-attaching custom bits to plans. I'd imagine we get 
rid of the `Codec` bit and have defaults just error if you don't match and 
handle extension/custom types yourself.
   
   Anyway that's a half baked idea and that discussion may be a blocker for 
this PR but I think it is largely unrelated to "is the deduplication worth 
doing by default", I'll address that in my next comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to