[GitHub] orc pull request #40: ORC-54: Evolve schemas based on field name rather than...

omalley Wed, 06 Jul 2016 10:58:25 -0700

Github user omalley commented on a diff in the pull request:

    https://github.com/apache/orc/pull/40#discussion_r69778493
  
    --- Diff: java/core/src/java/org/apache/orc/impl/SchemaEvolution.java ---
    @@ -18,25 +18,33 @@
     
     package org.apache.orc.impl;
     
    -import java.io.IOException;
     import java.util.ArrayList;
     import java.util.HashMap;
     import java.util.List;
     import java.util.Map;
    +import java.util.regex.Pattern;
     
     import org.apache.orc.TypeDescription;
     import org.slf4j.Logger;
     import org.slf4j.LoggerFactory;
     
     /**
    - * Take the file types and the (optional) configuration column names/types 
and see if there
    - * has been schema evolution.
    + * Infer and track the evolution between the schema as stored in the file 
and the schema that has been requested by the
    + * reader.
      */
     public class SchemaEvolution {
    +
    +  public static class IllegalEvolutionException extends RuntimeException {
    +    public IllegalEvolutionException(String msg) {
    +      super(msg);
    +    }
    +  }
    +
       private final Map<Integer, TypeDescription> readerToFile;
       private final boolean[] included;
       private final TypeDescription readerSchema;
       private static final Logger LOG = 
LoggerFactory.getLogger(SchemaEvolution.class);
    +  private static final Pattern missingMetadataPattern = 
Pattern.compile("_col\\d+");
    --- End diff --
    
    You should check for WriterVersion < HIVE_4243 before bothering to check 
the column names. The vast majority of files from before HIVE_4243 have the 
synthetic column names.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #40: ORC-54: Evolve schemas based on field name rather than...

Reply via email to