This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.0
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/branch-2.0 by this push:
     new 18235401e ORC-1697: Fix IllegalArgumentException when reading json 
timestamp type in benchmark
18235401e is described below

commit 18235401eea31af3ca7edf9f5cfdf18d0e694f37
Author: sychen <[email protected]>
AuthorDate: Sun Aug 4 19:21:31 2024 -0700

    ORC-1697: Fix IllegalArgumentException when reading json timestamp type in 
benchmark
    
    ### What changes were proposed in this pull request?
    This PR aims to fix `IllegalArgumentException` when reading json timestamp 
type in benchmark.
    
    Write and read json, convert timestamp type to long type instead of string 
type.
    
    ### Why are the changes needed?
    ORC-1191 Switch the csv format of taxi to parquet and read the timestamp 
format of parquet, but it is in microseconds format, which is different from 
the millisecond format of Java's `java.sql.Timestamp`.
    
    taxi source parquet meta
    ```bash
      optional int64 tpep_pickup_datetime (TIMESTAMP(MICROS,false));
      optional int64 tpep_dropoff_datetime (TIMESTAMP(MICROS,false));
    ```
    
    When we write the data into json and then use the scan command, we will get 
the following error.
    ```java
    java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -format json
    ```
    
    ```
    Exception in thread "main" java.lang.IllegalArgumentException: Timestamp 
format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
            at java.sql/java.sql.Timestamp.valueOf(Timestamp.java:224)
            at 
org.apache.orc.bench.core.convert.json.JsonReader$TimestampColumnConverter.convert(JsonReader.java:175)
            at 
org.apache.orc.bench.core.convert.json.JsonReader.nextBatch(JsonReader.java:86)
            at 
org.apache.orc.bench.core.convert.ScanVariants.run(ScanVariants.java:92)
            at org.apache.orc.bench.core.Driver.main(Driver.java:64)
    ```
    
    Because json data of type timestamp is written via 
`java.sql.Timestamp#toString`, but reading the data 
`java.sql.Timestamp#valueOf` will report an error.
    
    ```java
        Timestamp ts = new Timestamp(1446341079000000L);
        System.out.println(ts);
        System.out.println(Timestamp.valueOf(ts.toString()));
    ```
    ```
    47802-09-23 02:50:00.0
    Exception in thread "main" java.lang.IllegalArgumentException: Timestamp 
format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
            at java.sql.Timestamp.valueOf(Timestamp.java:237)
    ```
    
    ### How was this patch tested?
    local test
    
    ```bash
    java -jar core/target/orc-benchmarks-core-*-uber.jar generate data -format 
json -data taxi -compress snappy
    ```
    
    ```bash
    java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -format json 
-data taxi -compress snappy
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No
    
    Closes #1902
    
    Closes #1930 from cxzl25/ORC-1697_v2.
    
    Authored-by: sychen <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit d09dbf344b0197751e2bd8a884953e01cbeca402)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../src/java/org/apache/orc/bench/core/convert/json/JsonReader.java    | 3 +--
 .../src/java/org/apache/orc/bench/core/convert/json/JsonWriter.java    | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git 
a/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonReader.java
 
b/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonReader.java
index 893b738b1..a63d80b5b 100644
--- 
a/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonReader.java
+++ 
b/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonReader.java
@@ -172,8 +172,7 @@ public class JsonReader implements BatchReader {
         vect.isNull[row] = true;
       } else {
         TimestampColumnVector vector = (TimestampColumnVector) vect;
-        vector.set(row, Timestamp.valueOf(value.getAsString()
-            .replaceAll("[TZ]", " ")));
+        vector.set(row, new Timestamp(value.getAsLong()));
       }
     }
   }
diff --git 
a/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonWriter.java
 
b/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonWriter.java
index 00b3de22e..527d8bf1c 100644
--- 
a/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonWriter.java
+++ 
b/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonWriter.java
@@ -160,8 +160,7 @@ public class JsonWriter implements BatchWriter {
               (int) ((LongColumnVector) vector).vector[row]).toString());
           break;
         case TIMESTAMP:
-          writer.value(((TimestampColumnVector) vector)
-              .asScratchTimestamp(row).toString());
+          writer.value(((TimestampColumnVector) 
vector).getTimestampAsLong(row));
           break;
         case LIST:
           printList(writer, (ListColumnVector) vector, schema, row);

Reply via email to