dmitrybugakov commented on code in PR #10510:
URL: https://github.com/apache/datafusion/pull/10510#discussion_r1602926580


##########
datafusion/expr/src/interval_arithmetic.rs:
##########
@@ -1469,6 +1472,8 @@ pub enum NullableInterval {
     MaybeNull { values: Interval },
     /// The value is definitely not null, and is within the specified range.
     NotNull { values: Interval },
+    /// Added to handle cases with insufficient statistics
+    Unknown,
 }

Review Comment:
   @alamb 
   I have conducted some additional tests on various queries and observed the 
following results:
   
   ```
   CREATE TABLE data_table (
       id INT,
       value INT
   );
   ```
   
   ```
   INSERT INTO data_table (id, value) VALUES
   (1, 100),
   (2, 200),
   (3, 300),
   (4, 400),
   (5, 500),
   (6, 600),
   (7, 700),
   (8, 800),
   (9, 900),
   (10, 1000);
   ```
   
   ```
   SELECT id, value FROM data_table WHERE value > 500; 
   ```
   
   _Log:_ 
   `Schema: id: Int32, value: Int32` 
   `Column 0: ColumnStatistics { null_count: Inexact(0), max_value: 
Exact(Int32(NULL)), min_value: Exact(Int32(NULL)), distinct_count: Absent }`
   `Column 1: ColumnStatistics { null_count: Inexact(0), max_value: 
Inexact(Int32(NULL)), min_value: Inexact(Int32(501)), distinct_count: Absent }`
   
   ```
   SELECT AVG(value) AS average_value, SUM(value) AS total_value FROM 
data_table; 
   ```
   
   _Log:_
   `Schema: average_value: Float64, total_value: Int64` 
   `Column 0: ColumnStatistics { null_count: Absent, max_value: Absent, 
min_value: Absent, distinct_count: Absent }`
   `Column 1: ColumnStatistics { null_count: Absent, max_value: Absent, 
min_value: Absent, distinct_count: Absent }`
   
   ```
   SELECT a.id AS id_a, a.value AS value_a, b.id AS id_b, b.value AS value_b
   FROM data_table a
   CROSS JOIN data_table b;
   ```
   _Log:_
   `Schema: id_a: Int32, value_a: Int32, id_b: Int32, value_b: Int32`
   `Column 0: ColumnStatistics { null_count: Exact(0), max_value: Absent, 
min_value: Absent, distinct_count: Absent }`
   `Column 1: ColumnStatistics { null_count: Exact(0), max_value: Absent, 
min_value: Absent, distinct_count: Absent }`
   `Column 2: ColumnStatistics { null_count: Exact(0), max_value: Absent, 
min_value: Absent, distinct_count: Absent }`
   `Column 3: ColumnStatistics { null_count: Exact(0), max_value: Absent, 
min_value: Absent, distinct_count: Absent }`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to