comphead commented on PR #2246:
URL: 
https://github.com/apache/datafusion-comet/pull/2246#issuecomment-3228649992

   The cause the test is failed related to parquet data file reader. The test 
data created by Parquet writer manually not with DataFrame API and the output 
schema is 
   
   ```
   message root {
     optional boolean _1;
     optional int32 _2 (INTEGER(8,true));
     optional int32 _3 (INTEGER(16,true));
     optional int32 _4;
     optional int64 _5;
     optional float _6;
     optional double _7;
     optional binary _8 (STRING);
     optional int32 _9 (INTEGER(8,false));
     optional int32 _10 (INTEGER(16,false));
     optional int32 _11 (INTEGER(32,false));
     optional int64 _12 (INTEGER(64,false));
     optional binary _13 (ENUM);
     optional fixed_len_byte_array(3) _14;
     optional int32 _15 (DECIMAL(5,2));
     optional int64 _16 (DECIMAL(18,10));
     optional fixed_len_byte_array(16) _17 (DECIMAL(38,37));
     optional int64 _18 (TIMESTAMP(MILLIS,true));
     optional int64 _19 (TIMESTAMP(MICROS,true));
     optional int32 _20 (DATE);
     optional binary _21;
     optional int32 _id;
   }
   ```
   
   Apparently Spark and DF/DuckDB has different understanding of rollover 
types(byte, short, etc) such as
   
   ```
     optional int32 _2 (INTEGER(8,true));
     optional int32 _3 (INTEGER(16,true));
     optional int32 _9 (INTEGER(8,false));
     optional int32 _10 (INTEGER(16,false));
   ```
   
   Reading the parquet file from spark/DF/duckDB provides different results
   
   Spark
   
   ```
   scala> sql("select _2, _10 from t1").show(false)
   +----+----+
   |_2  |_10 |
   +----+----+
   |NULL|NULL|
   |1   |-1  |
   |2   |-2  |
   +----+----+
   ```
   
   DF
   ```
   > select _2, _10 from t2;
   +------+-------+
   | _2   | _10   |
   +------+-------+
   | NULL | NULL  |
   | 1    | 65535 |
   | 2    | 65534 |
   +------+-------+
   3 row(s) fetched. 
   Elapsed 0.005 seconds
   ```
   
   DuckDB
   ```
   D select * from '/tmp/test.parquet';
   Invalid Input Error: Failed to cast value: Type UINT32 with value 4294967294 
can't be cast because the value is out of range for the destination type UINT8
   ```
   
   Moving the test to more stable datatypes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to