andygrove commented on issue #938:
URL: 
https://github.com/apache/datafusion-comet/issues/938#issuecomment-2366867722

   > This might work ok for tpc-h but tpc-ds data has nulls and the null check 
is required perhaps? Does ballista know about the nullability of the data?
   
   Yes, the TPC-H data in this case is known not to contain nulls, as shown in 
the Parquet schema below, so the `IsNotNull` check here is redundant. For 
TPC-DS where the schema allows nulls, we would still need the check.
   
   ```
   $ bdt schema lineitem.parquet/
   +-----------------+-------------------+-------------+
   | column_name     | data_type         | is_nullable |
   +-----------------+-------------------+-------------+
   | l_orderkey      | Int64             | NO          |
   | l_partkey       | Int64             | NO          |
   | l_suppkey       | Int64             | NO          |
   | l_linenumber    | Int32             | NO          |
   | l_quantity      | Decimal128(11, 2) | NO          |
   | l_extendedprice | Decimal128(11, 2) | NO          |
   | l_discount      | Decimal128(11, 2) | NO          |
   | l_tax           | Decimal128(11, 2) | NO          |
   | l_returnflag    | Utf8              | NO          |
   | l_linestatus    | Utf8              | NO          |
   | l_shipdate      | Date32            | NO          |
   | l_commitdate    | Date32            | NO          |
   | l_receiptdate   | Date32            | NO          |
   | l_shipinstruct  | Utf8              | NO          |
   | l_shipmode      | Utf8              | NO          |
   | l_comment       | Utf8              | NO          |
   +-----------------+-------------------+-------------+
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to