[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364736#comment-15364736
 ] 

ASF GitHub Bot commented on ORC-54:
-----------------------------------

Github user omalley commented on a diff in the pull request:

    https://github.com/apache/orc/pull/40#discussion_r69778493
  
    --- Diff: java/core/src/java/org/apache/orc/impl/SchemaEvolution.java ---
    @@ -18,25 +18,33 @@
     
     package org.apache.orc.impl;
     
    -import java.io.IOException;
     import java.util.ArrayList;
     import java.util.HashMap;
     import java.util.List;
     import java.util.Map;
    +import java.util.regex.Pattern;
     
     import org.apache.orc.TypeDescription;
     import org.slf4j.Logger;
     import org.slf4j.LoggerFactory;
     
     /**
    - * Take the file types and the (optional) configuration column names/types 
and see if there
    - * has been schema evolution.
    + * Infer and track the evolution between the schema as stored in the file 
and the schema that has been requested by the
    + * reader.
      */
     public class SchemaEvolution {
    +
    +  public static class IllegalEvolutionException extends RuntimeException {
    +    public IllegalEvolutionException(String msg) {
    +      super(msg);
    +    }
    +  }
    +
       private final Map<Integer, TypeDescription> readerToFile;
       private final boolean[] included;
       private final TypeDescription readerSchema;
       private static final Logger LOG = 
LoggerFactory.getLogger(SchemaEvolution.class);
    +  private static final Pattern missingMetadataPattern = 
Pattern.compile("_col\\d+");
    --- End diff --
    
    You should check for WriterVersion < HIVE_4243 before bothering to check 
the column names. The vast majority of files from before HIVE_4243 have the 
synthetic column names.


> Evolve schemas based on field name rather than index
> ----------------------------------------------------
>
>                 Key: ORC-54
>                 URL: https://issues.apache.org/jira/browse/ORC-54
>             Project: Orc
>          Issue Type: Improvement
>            Reporter: Mark Wagner
>            Assignee: Mark Wagner
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to