comphead commented on PR #2246: URL: https://github.com/apache/datafusion-comet/pull/2246#issuecomment-3228649992
The cause the test is failed related to parquet data file reader. The test data created by Parquet writer manually not with DataFrame API and the output schema is ``` message root { optional boolean _1; optional int32 _2 (INTEGER(8,true)); optional int32 _3 (INTEGER(16,true)); optional int32 _4; optional int64 _5; optional float _6; optional double _7; optional binary _8 (STRING); optional int32 _9 (INTEGER(8,false)); optional int32 _10 (INTEGER(16,false)); optional int32 _11 (INTEGER(32,false)); optional int64 _12 (INTEGER(64,false)); optional binary _13 (ENUM); optional fixed_len_byte_array(3) _14; optional int32 _15 (DECIMAL(5,2)); optional int64 _16 (DECIMAL(18,10)); optional fixed_len_byte_array(16) _17 (DECIMAL(38,37)); optional int64 _18 (TIMESTAMP(MILLIS,true)); optional int64 _19 (TIMESTAMP(MICROS,true)); optional int32 _20 (DATE); optional binary _21; optional int32 _id; } ``` Apparently Spark and DF/DuckDB has different understanding of rollover types(byte, short, etc) such as ``` optional int32 _2 (INTEGER(8,true)); optional int32 _3 (INTEGER(16,true)); optional int32 _9 (INTEGER(8,false)); optional int32 _10 (INTEGER(16,false)); ``` Reading the parquet file from spark/DF/duckDB provides different results Spark ``` scala> sql("select _2, _10 from t1").show(false) +----+----+ |_2 |_10 | +----+----+ |NULL|NULL| |1 |-1 | |2 |-2 | +----+----+ ``` DF ``` > select _2, _10 from t2; +------+-------+ | _2 | _10 | +------+-------+ | NULL | NULL | | 1 | 65535 | | 2 | 65534 | +------+-------+ 3 row(s) fetched. Elapsed 0.005 seconds ``` DuckDB ``` D select * from '/tmp/test.parquet'; Invalid Input Error: Failed to cast value: Type UINT32 with value 4294967294 can't be cast because the value is out of range for the destination type UINT8 ``` Moving the test to more stable datatypes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org