cxzl25 opened a new pull request, #1930:
URL: https://github.com/apache/orc/pull/1930

   ### What changes were proposed in this pull request?
   This PR aims to fix `IllegalArgumentException` when reading json timestamp 
type in benchmark.
   
   Write and read json, convert timestamp type to long type instead of string 
type.
   
   ### Why are the changes needed?
   ORC-1191 Switch the csv format of taxi to parquet and read the timestamp 
format of parquet, but it is in microseconds format, which is different from 
the millisecond format of Java's `java.sql.Timestamp`.
   
   taxi source parquet meta
   ```bash
     optional int64 tpep_pickup_datetime (TIMESTAMP(MICROS,false));
     optional int64 tpep_dropoff_datetime (TIMESTAMP(MICROS,false));
   ```
   
   When we write the data into json and then use the scan command, we will get 
the following error.
   ```java
   java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -format json
   ```
   
   ```
   Exception in thread "main" java.lang.IllegalArgumentException: Timestamp 
format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
        at java.sql/java.sql.Timestamp.valueOf(Timestamp.java:224)
        at 
org.apache.orc.bench.core.convert.json.JsonReader$TimestampColumnConverter.convert(JsonReader.java:175)
        at 
org.apache.orc.bench.core.convert.json.JsonReader.nextBatch(JsonReader.java:86)
        at 
org.apache.orc.bench.core.convert.ScanVariants.run(ScanVariants.java:92)
        at org.apache.orc.bench.core.Driver.main(Driver.java:64)
   ```
   
   Because json data of type timestamp is written via 
`java.sql.Timestamp#toString`, but reading the data 
`java.sql.Timestamp#valueOf` will report an error.
   
   ```java
       Timestamp ts = new Timestamp(1446341079000000L);
       System.out.println(ts);
       System.out.println(Timestamp.valueOf(ts.toString()));
   ```
   ```
   47802-09-23 02:50:00.0
   Exception in thread "main" java.lang.IllegalArgumentException: Timestamp 
format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
        at java.sql.Timestamp.valueOf(Timestamp.java:237)
   ```
   
   
   ### How was this patch tested?
   local test
   
   ```bash
   java -jar core/target/orc-benchmarks-core-*-uber.jar generate data -format 
json -data taxi -compress snappy
   ```
   
   ```bash
   java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -format json 
-data taxi -compress snappy
   ```
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to