[GitHub] [arrow-datafusion] liukun4515 commented on issue #3031: optimize/simplify the literal data type and remove unnecessary cast、try_cast

GitBox Wed, 17 Aug 2022 05:07:56 -0700


liukun4515 commented on issue #3031:
URL: 
https://github.com/apache/arrow-datafusion/issues/3031#issuecomment-1217923520


   > Here is a self contained reproducer for anyone following along:
   > 
   > ```sql
   > ❯ create table foo as select column1 as d from (values (1), (2));
   > +---+
   > | d |
   > +---+
   > | 1 |
   > | 2 |
   > +---+
   > 2 rows in set. Query took 0.005 seconds.
   > ❯ create table bar as select cast (d as decimal) as d from foo;
   > +--------------+
   > | d            |
   > +--------------+
   > | 1.0000000000 |
   > | 2.0000000000 |
   > +--------------+
   > 2 rows in set. Query took 0.005 seconds.
   > ❯ explain select * from bar where d = 1.4;
   > 
+---------------+-----------------------------------------------------------------------------------+
   > | plan_type     | plan                                                     
                         |
   > 
+---------------+-----------------------------------------------------------------------------------+
   > | logical_plan  | Projection: #bar.d                                       
                         |
   > |               |   Filter: #bar.d = Float64(1.4)                          
                         |
   > |               |     TableScan: bar projection=[d]                        
                         |
   > | physical_plan | ProjectionExec: expr=[d@0 as d]                          
                         |
   > |               |   CoalesceBatchesExec: target_batch_size=4096            
                         |
   > |               |     FilterExec: CAST(d@0 AS Decimal128(38, 15)) = 
CAST(1.4 AS Decimal128(38, 15)) |
   > |               |       RepartitionExec: partitioning=RoundRobinBatch(16)  
                         |
   > |               |         MemoryExec: partitions=1, partition_sizes=[1]    
                         |
   > |               |                                                          
                         |
   > 
+---------------+-----------------------------------------------------------------------------------+
   > 2 rows in set. Query took 0.005 seconds.
   > ```
   > 
   > The FilterExec line above should not have the CAST operations in them
   
   The `cast` adding in the creation of physical expr/physical plan.
   It follow a generate rule for coerced binary comparison.
   Like below:
   
   INT32  < INT64 -> INT64 
   
   DECIMAL(10,2) < DOUBLE  -> Other decimal data type.
   
   I think it all in the `comparison_binary_numeric_coercion` function.
   
   This is just the general rule, and it works well in all cases.
   But in many user case, we just use the literal as filter expr and other 
condition as this issue, the new optimizer rule should resolve this case like 
in the spark.
   
   I have file a draft pr which add a logical optimizer rule to do this, but it 
maybe ready tomorrow because of some changes of plan needed to reviewed by 
myself first.  I think the rule can works well for us.
   
   @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] liukun4515 commented on issue #3031: optimize/simplify the literal data type and remove unnecessary cast、try_cast

Reply via email to