viirya commented on issue #44:
URL:
https://github.com/apache/arrow-datafusion-comet/issues/44#issuecomment-1955993169
Oh, I know what is the root cause of it now.
If you collect the data and dump it out, you will see the result is correct.
But `Dataset.show()` prints out incorrect result.
What `Dataset.show()` actually does, is to project each column to string
type and collect the result then format it for printing.
So the incorrect result is caused by `cast(EventDate#5 as string)`.
I check the input array to `Cast` expression in Comet, and it is an Int32
array. For `Cast`, it does Int32 to String conversion. That's why we see
integers there instead of date string.
I look into the column `EventDate`. Its logical type in Parquet is actually
`Integer { bit_width: 16, is_signed: false }`, not `Date`. I think in Spark as
read schema is specified (from the table definition), it can overwrite the
logical type from Parquet column. In Comet, I don't see we do this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]