[spark] branch master updated: [SPARK-39260][SQL] Use `Reader.getSchema` instead of `Reader.getTypes` in `SparkOrcNewRecordReader`

dongjoon Mon, 23 May 2022 14:30:10 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 4f43421a5b3 [SPARK-39260][SQL] Use `Reader.getSchema` instead of 
`Reader.getTypes` in `SparkOrcNewRecordReader`
4f43421a5b3 is described below

commit 4f43421a5b33988a841c49d11d8b916e9d4414f4
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon May 23 14:29:51 2022 -0700

    [SPARK-39260][SQL] Use `Reader.getSchema` instead of `Reader.getTypes` in 
`SparkOrcNewRecordReader`
    
    ### What changes were proposed in this pull request?
    
    This PR aims to use `org.apache.orc.Reader.getSchema` instead of 
`org.apache.orc.Reader.getTypes` in `SparkOrcNewRecordReader`
    
    ### Why are the changes needed?
    
    `getTypes` was deprecated. This is the only usage in Apache Spark.
    
    - 
https://github.com/apache/orc/blob/main/java/core/src/java/org/apache/orc/Reader.java#L144
    ```java
      /**
       * Get the list of types contained in the file. The root type is the first
       * type in the list.
       * return the list of flattened types
       * deprecated use getSchema instead
       * since 1.1.0
       */
      List<OrcProto.Type> getTypes();
    ```
    
    In addition, AS-IS implementation is only a slow-wrapper.
    - 
https://github.com/apache/orc/blob/1e2962064b209f1b00188877f08d4226da85c640/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L259-L262
    ```java
      Override
      public List<OrcProto.Type> getTypes() {
        return OrcUtils.getOrcTypes(schema);
      }
    ```
    
    - 
https://github.com/apache/orc/blob/1e2962064b209f1b00188877f08d4226da85c640/java/core/src/java/org/apache/orc/OrcUtils.java#L108-L112
    ```java
      public static List<OrcProto.Type> getOrcTypes(TypeDescription typeDescr) {
        List<OrcProto.Type> result = new ArrayList<>();
        appendOrcTypes(result, typeDescr);
        return result;
      }
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Pass the CIs.
    
    Closes #36638 from dongjoon-hyun/SPARK-39260.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git 
a/sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java
 
b/sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java
index 8e9362ab8af..255c39051d1 100644
--- 
a/sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java
+++ 
b/sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java
@@ -41,11 +41,9 @@ public class SparkOrcNewRecordReader extends
 
   public SparkOrcNewRecordReader(Reader file, Configuration conf,
       long offset, long length) throws IOException {
-    if (file.getTypes().isEmpty()) {
-      numColumns = 0;
-    } else {
-      numColumns = file.getTypes().get(0).getSubtypesCount();
-    }
+    // TypeDescription.children is null in case of primitive types.
+    // However, it doesn't happen on Reader.getSchema()
+    numColumns = file.getSchema().getChildren().size();
     value = new OrcStruct(numColumns);
     this.reader = OrcInputFormat.createReaderFromFile(file, conf, offset,
         length);


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-39260][SQL] Use `Reader.getSchema` instead of `Reader.getTypes` in `SparkOrcNewRecordReader`

Reply via email to