andygrove commented on issue #938: URL: https://github.com/apache/datafusion-comet/issues/938#issuecomment-2366867722
> This might work ok for tpc-h but tpc-ds data has nulls and the null check is required perhaps? Does ballista know about the nullability of the data? Yes, the TPC-H data in this case is known not to contain nulls, as shown in the Parquet schema below, so the `IsNotNull` check here is redundant. For TPC-DS where the schema allows nulls, we would still need the check. ``` $ bdt schema lineitem.parquet/ +-----------------+-------------------+-------------+ | column_name | data_type | is_nullable | +-----------------+-------------------+-------------+ | l_orderkey | Int64 | NO | | l_partkey | Int64 | NO | | l_suppkey | Int64 | NO | | l_linenumber | Int32 | NO | | l_quantity | Decimal128(11, 2) | NO | | l_extendedprice | Decimal128(11, 2) | NO | | l_discount | Decimal128(11, 2) | NO | | l_tax | Decimal128(11, 2) | NO | | l_returnflag | Utf8 | NO | | l_linestatus | Utf8 | NO | | l_shipdate | Date32 | NO | | l_commitdate | Date32 | NO | | l_receiptdate | Date32 | NO | | l_shipinstruct | Utf8 | NO | | l_shipmode | Utf8 | NO | | l_comment | Utf8 | NO | +-----------------+-------------------+-------------+ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org