cbb330 opened a new issue, #49362:
URL: https://github.com/apache/arrow/issues/49362

   ### Summary
   
   Part 3 of ORC predicate pushdown (#48986). Depends on #49361.
   
   Extend the initial INT32/INT64 greater-than implementation to cover all 
comparison operators, logical operators, set operations, and null handling.
   
   ### Operator coverage
   
   The guarantee-based approach means most operators work automatically once 
the guarantee expression is correct. The work here is:
   1. Ensuring guarantee expressions handle null semantics correctly for each 
operator class
   2. Adding test coverage for each operator
   3. Handling edge cases specific to certain operators (e.g., IN with mixed 
types)
   
   | Category | Operators | Notes |
   |----------|-----------|-------|
   | Comparison | `>`, `>=`, `<`, `<=`, `==`, `!=` | All work via 
`SimplifyWithGuarantee()` with min/max range guarantees |
   | Logical | `AND`, `OR`, `NOT` | Compound predicates; Arrow's simplifier 
handles these given correct per-field guarantees |
   | Set | `IN` | Range intersection: if all IN values fall outside [min, max], 
skip stripe |
   | Null | `IS NULL`, `IS NOT NULL` | Use `hasNull()` and `getNumberOfValues() 
== 0` from ORC stats |
   
   ### Future type extensions
   
   This sub-issue covers operators for INT32/INT64. Extending to additional 
types is a follow-up:
   
   | Type | Key concern |
   |------|------------|
   | DOUBLE, FLOAT | NaN in statistics makes range unusable; ±Inf are valid 
bounds |
   | STRING | ORC may truncate long strings in statistics; collation/encoding 
assumptions |
   | DATE | int32 days since epoch — straightforward |
   | TIMESTAMP | Unit conversion (ORC millis + sub-millis nanos → Arrow nanos) |
   | DECIMAL | Scale/precision must match between stats and field type |
   
   ### Tests
   
   - Each comparison operator individually (>, >=, <, <=, ==, !=)
   - AND compound predicate (both conditions must hold)
   - OR compound predicate (either condition suffices)
   - NOT operator
   - IN operator with values inside/outside stripe range
   - IS NULL on stripe with/without nulls
   - IS NOT NULL on all-null stripe
   - Compound: `(id > 100 AND id < 200) OR id == 500`
   - Unsupported type in predicate → conservative include (no skip)
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to