I've been looking at schema evolution lately, and we don't currently
support changing physical types when a logical type does not change.
This could be a problem when two different systems have different, but
valid, representations for a logical type.
Decimal, for example, can be represented either with a binary or a
fixed. But if the requested schema for a file (say, binary) doesn't
match the underlying type (fixed) then the check that verifies all
columns can be satisfied fails, even though both requested type and
actual type are valid.
We can fix this by adding logic to the `checkContains` methods in the
Type classes, plus support in the converters. But I'm wondering if we
shouldn't take a closer look at projection and schema evolution in
general at this point.
Are there other ways to solve this problem? Can we do projection
differently, so we don't have to ignore the physical type of a requested
column in some cases? What are the rules for valid projection?
Thanks!
rb
--
Ryan Blue
Software Engineer
Cloudera, Inc.