This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/main by this push:
new d09dbf344 ORC-1697: Fix IllegalArgumentException when reading json
timestamp type in benchmark
d09dbf344 is described below
commit d09dbf344b0197751e2bd8a884953e01cbeca402
Author: sychen <[email protected]>
AuthorDate: Sun Aug 4 19:21:31 2024 -0700
ORC-1697: Fix IllegalArgumentException when reading json timestamp type in
benchmark
### What changes were proposed in this pull request?
This PR aims to fix `IllegalArgumentException` when reading json timestamp
type in benchmark.
Write and read json, convert timestamp type to long type instead of string
type.
### Why are the changes needed?
ORC-1191 Switch the csv format of taxi to parquet and read the timestamp
format of parquet, but it is in microseconds format, which is different from
the millisecond format of Java's `java.sql.Timestamp`.
taxi source parquet meta
```bash
optional int64 tpep_pickup_datetime (TIMESTAMP(MICROS,false));
optional int64 tpep_dropoff_datetime (TIMESTAMP(MICROS,false));
```
When we write the data into json and then use the scan command, we will get
the following error.
```java
java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -format json
```
```
Exception in thread "main" java.lang.IllegalArgumentException: Timestamp
format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
at java.sql/java.sql.Timestamp.valueOf(Timestamp.java:224)
at
org.apache.orc.bench.core.convert.json.JsonReader$TimestampColumnConverter.convert(JsonReader.java:175)
at
org.apache.orc.bench.core.convert.json.JsonReader.nextBatch(JsonReader.java:86)
at
org.apache.orc.bench.core.convert.ScanVariants.run(ScanVariants.java:92)
at org.apache.orc.bench.core.Driver.main(Driver.java:64)
```
Because json data of type timestamp is written via
`java.sql.Timestamp#toString`, but reading the data
`java.sql.Timestamp#valueOf` will report an error.
```java
Timestamp ts = new Timestamp(1446341079000000L);
System.out.println(ts);
System.out.println(Timestamp.valueOf(ts.toString()));
```
```
47802-09-23 02:50:00.0
Exception in thread "main" java.lang.IllegalArgumentException: Timestamp
format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
at java.sql.Timestamp.valueOf(Timestamp.java:237)
```
### How was this patch tested?
local test
```bash
java -jar core/target/orc-benchmarks-core-*-uber.jar generate data -format
json -data taxi -compress snappy
```
```bash
java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -format json
-data taxi -compress snappy
```
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #1902
Closes #1930 from cxzl25/ORC-1697_v2.
Authored-by: sychen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../src/java/org/apache/orc/bench/core/convert/json/JsonReader.java | 3 +--
.../src/java/org/apache/orc/bench/core/convert/json/JsonWriter.java | 3 +--
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git
a/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonReader.java
b/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonReader.java
index 893b738b1..a63d80b5b 100644
---
a/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonReader.java
+++
b/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonReader.java
@@ -172,8 +172,7 @@ public class JsonReader implements BatchReader {
vect.isNull[row] = true;
} else {
TimestampColumnVector vector = (TimestampColumnVector) vect;
- vector.set(row, Timestamp.valueOf(value.getAsString()
- .replaceAll("[TZ]", " ")));
+ vector.set(row, new Timestamp(value.getAsLong()));
}
}
}
diff --git
a/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonWriter.java
b/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonWriter.java
index 00b3de22e..527d8bf1c 100644
---
a/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonWriter.java
+++
b/java/bench/core/src/java/org/apache/orc/bench/core/convert/json/JsonWriter.java
@@ -160,8 +160,7 @@ public class JsonWriter implements BatchWriter {
(int) ((LongColumnVector) vector).vector[row]).toString());
break;
case TIMESTAMP:
- writer.value(((TimestampColumnVector) vector)
- .asScratchTimestamp(row).toString());
+ writer.value(((TimestampColumnVector)
vector).getTimestampAsLong(row));
break;
case LIST:
printList(writer, (ListColumnVector) vector, schema, row);