alamb commented on code in PR #9184:
URL: https://github.com/apache/arrow-datafusion/pull/9184#discussion_r1485579242


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -182,6 +191,179 @@ pub trait PruningStatistics {
 /// ```
 ///
 /// See [`PruningPredicate::try_new`] and [`PruningPredicate::prune`] for more 
information.
+///
+/// # Background
+///
+/// ## Boolean Tri-state logic
+///
+/// To understand the details of the rest of this documentation, it is 
important
+/// to understand how the tri-state boolean logic in SQL works. As this is
+/// somewhat esoteric, we review it here.
+///
+/// SQL has a notion of `NULL` that represents the value is `“unknown”` and 
this
+/// uncertainty propagates through expressions. SQL `NULL` behaves very
+/// differently than the `NULL` in most other languages where it is a special,
+/// sentinel value (e.g. `0` in `C/C++`). While representing uncertainty with
+/// `NULL` is powerful and elegant, SQL `NULL` s are often deeply confusing 
when
+/// first encountered as they behave differently than most programmers may
+/// expect.
+///
+/// In most other programming languages,
+/// * `a == NULL` evaluates to `true` if `a` also had the value `NULL`
+/// * `a == NULL` evaluates to `false` if a has any other value
+///
+/// However, in SQL `a = NULL` **always** evaluates to `NULL` (never `true` or 
`false`):
+///
+/// | Expression     | Result   |
+/// | ------------- | --------- |
+/// | `1 = NULL`    | `NULL`    |
+/// | `NULL = NULL` | `NULL`    |
+///
+/// Also important is how `AND` and `OR` works with tri-state boolean logic as
+/// (perhaps counterintuitively) the result is **not** always NULL. While
+/// consistent with the notion of `NULL` representing “unknown”, this is again,
+/// often deeply confusing 🤯 when first encountered.
+///
+/// | Expression       | Result    | Intuition   |
+/// | ---------------  | --------- | ----------- |
+/// | `NULL AND true`  |   `NULL`  | The `NULL` stands for “unknown” and if it 
were `true` or `false` the overall expression value could change |
+/// | `NULL AND false` |  `false`  | If the `NULL` was either `true` or 
`false` the overall expression is still `false` |
+/// | `NULL AND NULL`  | `NULL`    |            |
+///
+/// | Expression      | Result    | Intuition |
+/// | --------------- | --------- | ---------- |
+/// | `NULL OR true`  | `true`    |  If the `NULL` was either `true` or 
`false` the overall expression is still `true` |
+/// | `NULL OR false` | `NULL`    |  The `NULL` stands for “unknown” and if it 
were `true` or `false` the overall expression value could change |
+/// | `NULL OR NULL`  |  `NULL`   |            |
+///
+/// ## SQL Filter Semantics
+///
+/// The SQL `WHERE` clause has a boolean expression, often called a filter or
+/// predicate. The semantics of this predicate are that the query evaluates the
+/// predicate for each row in the input tables and:
+///
+/// * Rows that evaluate to `true` are returned in the query results
+/// * Rows that evaluate to `false` are not returned (“filtered out” or 
“pruned” or “skipped”).
+/// * Rows that evaluate to `NULL` are **NOT** returned (also “filtered out”) 
– *this property appears many times in the discussion below*

Review Comment:
   This is a very good observation.
   
   Yes the NULL semantics in filter expressions is the opposite of what the 
rewritten pruning predicate does with NULL (which is quite confusing). I added 
a note trying to clarify this: 74261e8e3



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to