[
https://issues.apache.org/jira/browse/SPARK-24322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-24322:
----------------------------------
Description:
ORC 1.4.4 includes [nine
fixes|https://issues.apache.org/jira/issues/?filter=12342568&jql=project%20%3D%20ORC%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%201.4.4].
One of the issues is about `Timestamp` bug (ORC-306) which occurs when
`native` ORC vectorized reader reads ORC column vector's sub-vector `times` and
`nanos`. ORC-306 fixes this according to the [original
definition|https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/TimestampColumnVector.java#L45-L46]
and the linked PR includes the updated interpretation on ORC column vectors.
Note that `hive` ORC reader and ORC MR reader is not affected.
{code}
scala> spark.version
res0: String = 2.3.0
scala> spark.sql("set spark.sql.orc.impl=native")
scala> Seq(java.sql.Timestamp.valueOf("1900-05-05
12:34:56.000789")).toDF().write.orc("/tmp/orc")
scala> spark.read.orc("/tmp/orc").show(false)
+--------------------------+
|value |
+--------------------------+
|1900-05-05 12:34:55.000789|
+--------------------------+
{code}
This issue aims to update Apache Spark to use it.
*FULL LIST*
|| ID || TITLE ||
| ORC-281 | Fix compiler warnings from clang 5.0 |
| ORC-301 | `extractFileTail` should open a file in `try` statement |
| ORC-304 | Fix TestRecordReaderImpl to not fail with new storage-api |
| ORC-306 | Fix incorrect workaround for bug in java.sql.Timestamp |
| ORC-324 | Add support for ARM and PPC arch |
| ORC-330 | Remove unnecessary Hive artifacts from root pom |
| ORC-332 | Add syntax version to orc_proto.proto |
| ORC-336 | Remove avro and parquet dependency management entries |
| ORC-360 | Implement error checking on subtype fields in Java |
was:
ORC 1.4.4 includes [nine
fixes|https://issues.apache.org/jira/issues/?filter=12342568&jql=project%20%3D%20ORC%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%201.4.4).
One of the issues is about `Timestamp` bug (ORC-306) which occurs when
`native` ORC vectorized reader reads ORC column vector's sub-vector `times` and
`nanos`. ORC-306 fixes this according to the [original
definition](https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/TimestampColumnVector.java#L45-L46)
and the linked PR includes the updated interpretation on ORC column vectors.
Note that `hive` ORC reader and ORC MR reader is not affected.
{code}
scala> spark.version
res0: String = 2.3.0
scala> spark.sql("set spark.sql.orc.impl=native")
scala> Seq(java.sql.Timestamp.valueOf("1900-05-05
12:34:56.000789")).toDF().write.orc("/tmp/orc")
scala> spark.read.orc("/tmp/orc").show(false)
+--------------------------+
|value |
+--------------------------+
|1900-05-05 12:34:55.000789|
+--------------------------+
{code}
This issue aims to update Apache Spark to use it.
*FULL LIST*
|| ID || TITLE ||
| ORC-281 | Fix compiler warnings from clang 5.0 |
| ORC-301 | `extractFileTail` should open a file in `try` statement |
| ORC-304 | Fix TestRecordReaderImpl to not fail with new storage-api |
| ORC-306 | Fix incorrect workaround for bug in java.sql.Timestamp |
| ORC-324 | Add support for ARM and PPC arch |
| ORC-330 | Remove unnecessary Hive artifacts from root pom |
| ORC-332 | Add syntax version to orc_proto.proto |
| ORC-336 | Remove avro and parquet dependency management entries |
| ORC-360 | Implement error checking on subtype fields in Java |
> Upgrade Apache ORC to 1.4.4
> ---------------------------
>
> Key: SPARK-24322
> URL: https://issues.apache.org/jira/browse/SPARK-24322
> Project: Spark
> Issue Type: Bug
> Components: Build
> Affects Versions: 2.4.0
> Reporter: Dongjoon Hyun
> Priority: Major
> Labels: correctness
>
> ORC 1.4.4 includes [nine
> fixes|https://issues.apache.org/jira/issues/?filter=12342568&jql=project%20%3D%20ORC%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%201.4.4].
> One of the issues is about `Timestamp` bug (ORC-306) which occurs when
> `native` ORC vectorized reader reads ORC column vector's sub-vector `times`
> and `nanos`. ORC-306 fixes this according to the [original
> definition|https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/TimestampColumnVector.java#L45-L46]
> and the linked PR includes the updated interpretation on ORC column vectors.
> Note that `hive` ORC reader and ORC MR reader is not affected.
> {code}
> scala> spark.version
> res0: String = 2.3.0
> scala> spark.sql("set spark.sql.orc.impl=native")
> scala> Seq(java.sql.Timestamp.valueOf("1900-05-05
> 12:34:56.000789")).toDF().write.orc("/tmp/orc")
> scala> spark.read.orc("/tmp/orc").show(false)
> +--------------------------+
> |value |
> +--------------------------+
> |1900-05-05 12:34:55.000789|
> +--------------------------+
> {code}
> This issue aims to update Apache Spark to use it.
> *FULL LIST*
> || ID || TITLE ||
> | ORC-281 | Fix compiler warnings from clang 5.0 |
> | ORC-301 | `extractFileTail` should open a file in `try` statement |
> | ORC-304 | Fix TestRecordReaderImpl to not fail with new storage-api |
> | ORC-306 | Fix incorrect workaround for bug in java.sql.Timestamp |
> | ORC-324 | Add support for ARM and PPC arch |
> | ORC-330 | Remove unnecessary Hive artifacts from root pom |
> | ORC-332 | Add syntax version to orc_proto.proto |
> | ORC-336 | Remove avro and parquet dependency management entries |
> | ORC-360 | Implement error checking on subtype fields in Java |
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]