[jira] [Work logged] (HIVE-26551) Support CREATE TABLE LIKE FILE for ORC

ASF GitHub Bot (Jira) Wed, 21 Sep 2022 22:18:06 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26551?focusedWorklogId=811018&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-811018
 ]


ASF GitHub Bot logged work on HIVE-26551:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Sep/22 05:17
            Start Date: 22/Sep/22 05:17
    Worklog Time Spent: 10m 
      Work Description: ayushtkn commented on code in PR #3611:
URL: https://github.com/apache/hive/pull/3611#discussion_r977206491


##########
common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java:
##########
@@ -524,6 +524,7 @@ public enum ErrorMsg {
   CTLF_MISSING_STORAGE_FORMAT_DESCRIPTOR(20021, "Failed to find 
StorageFormatDescriptor for file format ''{0}''", true),
   PARQUET_FOOTER_ERROR(20022, "Failed to read parquet footer:"),
   PARQUET_UNHANDLED_TYPE(20023, "Unhandled type {0}", true),
+  ORC_FOOTER_ERROR(20024, "Failed to read orc footer:"),

Review Comment:
   Well the colon is there in ``PARQUET_FOOTER_ERROR`` but I am not sure why it 
is so, when this exception is thrown, post the colon it is the trace, should 
have been a period according to me, but let it be to be in sync with 
PARQUET_FOOTER_ERROR. But just out of curiosity if you know the reason behind 
it do let me know



##########
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java:
##########
@@ -117,4 +130,87 @@ public ObjectInspector getObjectInspector() throws 
SerDeException {
     return inspector;
   }
 
+  @Override
+  public List<FieldSchema> readSchema(Configuration conf, String file) throws 
SerDeException {
+    List<String> fieldNames;
+    List<TypeDescription> fieldTypes;
+    try (Reader reader = OrcFile.createReader(new Path(file), 
OrcFile.readerOptions(conf))) {
+      fieldNames = reader.getSchema().getFieldNames();
+      fieldTypes =  reader.getSchema().getChildren();
+    } catch (Exception e) {
+      throw new SerDeException(ErrorMsg.ORC_FOOTER_ERROR.getErrorCodedMsg(), 
e);
+    }
+
+    List<FieldSchema> schema = new ArrayList<>();
+    for (int i = 0; i < fieldNames.size(); i++) {
+      FieldSchema fieldSchema = convertOrcTypeToFieldSchema(fieldNames.get(i), 
fieldTypes.get(i));
+      schema.add(fieldSchema);
+      LOG.debug("Inferred field schema {}", fieldSchema);
+    }
+    return schema;
+  }
+
+  private FieldSchema convertOrcTypeToFieldSchema(String fieldName, 
TypeDescription fieldType) {
+    String typeName = convertOrcTypeToFieldType(fieldType);
+    return new FieldSchema(fieldName, typeName, "Inferred from Orc file.");
+  }
+
+  private String convertOrcTypeToFieldType(TypeDescription fieldType) {
+    if (fieldType.getCategory().isPrimitive()) {
+        return convertPrimitiveType(fieldType);
+      }
+    return convertComplexType(fieldType);
+  }
+
+  private String convertPrimitiveType(TypeDescription fieldType) {
+    if (fieldType.getCategory().getName().equals("timestamp with local time 
zone")) {
+      throw new IllegalArgumentException("Unhandled ORC type " + 
fieldType.getCategory().getName());
+    }
+    return fieldType.toString();
+  }
+
+  private String convertComplexType(TypeDescription fieldType) {
+    StringBuilder buffer = new StringBuilder();
+    buffer.append(fieldType.getCategory().getName());
+    switch (fieldType.getCategory()) {
+    case LIST:
+    case MAP:
+    case UNION:
+      buffer.append('<');
+      for (int i = 0; i < fieldType.getChildren().size(); i++) {
+        if (i != 0) {
+          buffer.append(',');
+        }
+        
buffer.append(convertOrcTypeToFieldType(fieldType.getChildren().get(i)));
+      }
+      buffer.append('>');
+      break;
+    case STRUCT:
+      buffer.append('<');
+      for(int i=0; i < fieldType.getChildren().size(); ++i) {
+        if (i != 0) {
+          buffer.append(',');
+        }
+        getStructFieldName(buffer, fieldType.getFieldNames().get(i));
+        buffer.append(':');
+        
buffer.append(convertOrcTypeToFieldType(fieldType.getChildren().get(i)));
+      }
+      buffer.append('>');
+      break;
+    default:
+      throw new IllegalArgumentException("ORC doesn't handle " +
+          fieldType.getCategory());
+    }
+    return buffer.toString();
+  }
+
+  static void getStructFieldName(StringBuilder buffer, String name) {
+    if (UNQUOTED_NAMES.matcher(name).matches()) {
+      buffer.append(name);
+    } else {
+      buffer.append('`');
+      buffer.append(name.replace("`", "``"));
+      buffer.append('`');

Review Comment:
   nit: Concurrent StringBuilder Can be reused.
   ```
         buffer.append('`').append(name.replace("`", "``")).append('`');
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java:
##########
@@ -117,4 +130,87 @@ public ObjectInspector getObjectInspector() throws 
SerDeException {
     return inspector;
   }
 
+  @Override
+  public List<FieldSchema> readSchema(Configuration conf, String file) throws 
SerDeException {
+    List<String> fieldNames;
+    List<TypeDescription> fieldTypes;
+    try (Reader reader = OrcFile.createReader(new Path(file), 
OrcFile.readerOptions(conf))) {
+      fieldNames = reader.getSchema().getFieldNames();
+      fieldTypes =  reader.getSchema().getChildren();
+    } catch (Exception e) {
+      throw new SerDeException(ErrorMsg.ORC_FOOTER_ERROR.getErrorCodedMsg(), 
e);
+    }
+
+    List<FieldSchema> schema = new ArrayList<>();
+    for (int i = 0; i < fieldNames.size(); i++) {
+      FieldSchema fieldSchema = convertOrcTypeToFieldSchema(fieldNames.get(i), 
fieldTypes.get(i));
+      schema.add(fieldSchema);
+      LOG.debug("Inferred field schema {}", fieldSchema);
+    }
+    return schema;
+  }
+
+  private FieldSchema convertOrcTypeToFieldSchema(String fieldName, 
TypeDescription fieldType) {
+    String typeName = convertOrcTypeToFieldType(fieldType);
+    return new FieldSchema(fieldName, typeName, "Inferred from Orc file.");
+  }
+
+  private String convertOrcTypeToFieldType(TypeDescription fieldType) {
+    if (fieldType.getCategory().isPrimitive()) {
+        return convertPrimitiveType(fieldType);
+      }
+    return convertComplexType(fieldType);
+  }
+
+  private String convertPrimitiveType(TypeDescription fieldType) {
+    if (fieldType.getCategory().getName().equals("timestamp with local time 
zone")) {
+      throw new IllegalArgumentException("Unhandled ORC type " + 
fieldType.getCategory().getName());
+    }
+    return fieldType.toString();
+  }
+
+  private String convertComplexType(TypeDescription fieldType) {
+    StringBuilder buffer = new StringBuilder();
+    buffer.append(fieldType.getCategory().getName());
+    switch (fieldType.getCategory()) {
+    case LIST:
+    case MAP:
+    case UNION:
+      buffer.append('<');
+      for (int i = 0; i < fieldType.getChildren().size(); i++) {
+        if (i != 0) {
+          buffer.append(',');
+        }
+        
buffer.append(convertOrcTypeToFieldType(fieldType.getChildren().get(i)));
+      }
+      buffer.append('>');
+      break;
+    case STRUCT:
+      buffer.append('<');
+      for(int i=0; i < fieldType.getChildren().size(); ++i) {
+        if (i != 0) {
+          buffer.append(',');
+        }
+        getStructFieldName(buffer, fieldType.getFieldNames().get(i));
+        buffer.append(':');
+        
buffer.append(convertOrcTypeToFieldType(fieldType.getChildren().get(i)));
+      }
+      buffer.append('>');
+      break;
+    default:
+      throw new IllegalArgumentException("ORC doesn't handle " +
+          fieldType.getCategory());
+    }
+    return buffer.toString();
+  }
+
+  static void getStructFieldName(StringBuilder buffer, String name) {
+    if (UNQUOTED_NAMES.matcher(name).matches()) {

Review Comment:
   Is the intent of this just to ensure there aren't any quoted `name`, I am 
not sure, but do explore 
   ```
   !name.startsWith("'") && !name.endsWith("'")
   ```
   Which lands up being cheaper.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 811018)
    Time Spent: 50m  (was: 40m)

> Support CREATE TABLE LIKE FILE for ORC
> --------------------------------------
>
>                 Key: HIVE-26551
>                 URL: https://issues.apache.org/jira/browse/HIVE-26551
>             Project: Hive
>          Issue Type: New Feature
>          Components: HiveServer2
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: zhangbutao
>            Assignee: zhangbutao
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/HIVE-26395 added the ability to create 
> table based on the existing parquet files. We can continue to support 
> creating table based on existing orc files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26551) Support CREATE TABLE LIKE FILE for ORC

Reply via email to