changsun20 commented on issue #14434:
URL: https://github.com/apache/datafusion/issues/14434#issuecomment-2726143005

   Hi @eliaperantoni, 
   
   Thank you raising this issue. As I explore potential implementations, I’d 
appreciate your insights on a few key points:
   
   **1. Warning Scope for `= NULL` Patterns**
   Could you help confirm the appropriate scope for detecting `= NULL` 
comparisons?
   
   - **Option 1**: Emit warnings universally whenever `= NULL` is parsed.
       - _Potential Risk_: Trigger warnings whenever "= NULL" is parsed. 
However, expressions like `UPDATE users SET password = NULL` would be 
legitimate usage.
       - Note: Since `Dml(Update)` isn't currently supported in DataFusion, 
this might not be an immediate issue but could become one.
   - **Option 2**: Restrict warnings to predicate contexts 
(WHERE/JOIN/HAVING/CASE WHEN clauses).
       - _Tradeoff_: This would be more precise but adds complexity.
   
   **2. Non-Fatal Warning Plumbing**
   You mentioned the need to implement plumbing and collect warnings via 
`Diagnostic::new_warning`. Could you please elaborate more on this? Can you 
briefly outline (or just give me a hint) how to propagate warnings without 
interrupting the execution flow? I recognize this may require deeper code 
context and may to too general to answer—feel free to defer discussion until I 
share a draft implementation!
   
   **3. Span Highlighting Strategy**
   For the query `SELECT * FROM users WHERE password = NULL`, the parsed 
structure shows:
   
   ```json
   BinaryOp {
        left: Identifier(
                Ident {
                        value: "password",
                        quote_style: None,
                        span: Span(Location(1,27)..Location(1,35)),
                },
        ),
        op: Eq,
        right: Value(
                Null,
        ),
   },
   ```
   
   I have come up with some possible approaches:
   
   - **Option 1**: Add `Span` to `op: Eq` and/or `right: Value(Null)` - most 
precise but risks breaking changes elsewhere
   - **Option 2**: Use the `password` identifier’s span - simplest approach, 
but highlights the column name rather than the actual issue ("= NULL").
   - **Option 3**: Use the identifier's `Span` as a starting point and estimate 
the entire expression's span - avoids structural changes but may be inaccurate 
with variable whitespace
   
   What would be the best approach in your opinion?
   
   Thank you for your guidance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to