cdmikechen opened a new issue #2034:
URL: https://github.com/apache/hudi/issues/2034
**Describe the problem you faced**
If using DeltaStreamer to get kafka avro data to hudi, DateType can't be
transformed to right data (like `2020-8-24`). DateType always shows
`1970-01-01`.
**To Reproduce**
Steps to reproduce the behavior:
I use debezium to get some mysql tables datas to kafka, and then use
DeltaStreamer to save in hudi. I checked columns and found that every date type
column always shows `1970-01-01`.
In `org.apache.hudi.AvroConversionHelper` hudi use these codes to cast int
to date:
```scala
case (DateType, INT) =>
(item: AnyRef) =>
if (item == null) {
null
} else {
if (item.isInstanceOf[Integer]) {
new Date(item.asInstanceOf[Integer].longValue())
} else {
new Date(item.asInstanceOf[Long])
}
}
```
I write some codes to test this:
```java
System.out.println(new java.sql.Date(18498));
System.out.println(org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(18498));
```
result:
```
1970-01-01
2020-08-24
```
**Environment Description**
* Hudi version : 0.6.0
* Spark version : 2.4.4
* Hive version : 2.3.3
* Hadoop version : 2.8.5
* Storage (HDFS/S3/GCS..) : HDFS
* Running on Docker? (yes/no) : no
**Additional context**
I think it's a bug, but I'm not sure if anyone else has encountered it and
can prove that it's ubiquitous.
If this is really a bug, I think we should propose a PR to fix it
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]