[ https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364736#comment-15364736 ]
ASF GitHub Bot commented on ORC-54: ----------------------------------- Github user omalley commented on a diff in the pull request: https://github.com/apache/orc/pull/40#discussion_r69778493 --- Diff: java/core/src/java/org/apache/orc/impl/SchemaEvolution.java --- @@ -18,25 +18,33 @@ package org.apache.orc.impl; -import java.io.IOException; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; +import java.util.regex.Pattern; import org.apache.orc.TypeDescription; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** - * Take the file types and the (optional) configuration column names/types and see if there - * has been schema evolution. + * Infer and track the evolution between the schema as stored in the file and the schema that has been requested by the + * reader. */ public class SchemaEvolution { + + public static class IllegalEvolutionException extends RuntimeException { + public IllegalEvolutionException(String msg) { + super(msg); + } + } + private final Map<Integer, TypeDescription> readerToFile; private final boolean[] included; private final TypeDescription readerSchema; private static final Logger LOG = LoggerFactory.getLogger(SchemaEvolution.class); + private static final Pattern missingMetadataPattern = Pattern.compile("_col\\d+"); --- End diff -- You should check for WriterVersion < HIVE_4243 before bothering to check the column names. The vast majority of files from before HIVE_4243 have the synthetic column names. > Evolve schemas based on field name rather than index > ---------------------------------------------------- > > Key: ORC-54 > URL: https://issues.apache.org/jira/browse/ORC-54 > Project: Orc > Issue Type: Improvement > Reporter: Mark Wagner > Assignee: Mark Wagner > > Schema evolution as it stands today allows adding fields to the end of > schemas or removing them from the end. However, because it is based on the > index of the column, you can only ever add or remove -- not both. > ORC files have the full schema information of their contents, so there's > actually enough metadata to support changing columns anywhere in the schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)