This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 4f43421a5b3 [SPARK-39260][SQL] Use `Reader.getSchema` instead of
`Reader.getTypes` in `SparkOrcNewRecordReader`
4f43421a5b3 is described below
commit 4f43421a5b33988a841c49d11d8b916e9d4414f4
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon May 23 14:29:51 2022 -0700
[SPARK-39260][SQL] Use `Reader.getSchema` instead of `Reader.getTypes` in
`SparkOrcNewRecordReader`
### What changes were proposed in this pull request?
This PR aims to use `org.apache.orc.Reader.getSchema` instead of
`org.apache.orc.Reader.getTypes` in `SparkOrcNewRecordReader`
### Why are the changes needed?
`getTypes` was deprecated. This is the only usage in Apache Spark.
-
https://github.com/apache/orc/blob/main/java/core/src/java/org/apache/orc/Reader.java#L144
```java
/**
* Get the list of types contained in the file. The root type is the first
* type in the list.
* return the list of flattened types
* deprecated use getSchema instead
* since 1.1.0
*/
List<OrcProto.Type> getTypes();
```
In addition, AS-IS implementation is only a slow-wrapper.
-
https://github.com/apache/orc/blob/1e2962064b209f1b00188877f08d4226da85c640/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L259-L262
```java
Override
public List<OrcProto.Type> getTypes() {
return OrcUtils.getOrcTypes(schema);
}
```
-
https://github.com/apache/orc/blob/1e2962064b209f1b00188877f08d4226da85c640/java/core/src/java/org/apache/orc/OrcUtils.java#L108-L112
```java
public static List<OrcProto.Type> getOrcTypes(TypeDescription typeDescr) {
List<OrcProto.Type> result = new ArrayList<>();
appendOrcTypes(result, typeDescr);
return result;
}
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes #36638 from dongjoon-hyun/SPARK-39260.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git
a/sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java
b/sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java
index 8e9362ab8af..255c39051d1 100644
---
a/sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java
+++
b/sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java
@@ -41,11 +41,9 @@ public class SparkOrcNewRecordReader extends
public SparkOrcNewRecordReader(Reader file, Configuration conf,
long offset, long length) throws IOException {
- if (file.getTypes().isEmpty()) {
- numColumns = 0;
- } else {
- numColumns = file.getTypes().get(0).getSubtypesCount();
- }
+ // TypeDescription.children is null in case of primitive types.
+ // However, it doesn't happen on Reader.getSchema()
+ numColumns = file.getSchema().getChildren().size();
value = new OrcStruct(numColumns);
this.reader = OrcInputFormat.createReaderFromFile(file, conf, offset,
length);
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]