I've been looking at schema evolution lately, and we don't currently support changing physical types when a logical type does not change. This could be a problem when two different systems have different, but valid, representations for a logical type.

Decimal, for example, can be represented either with a binary or a fixed. But if the requested schema for a file (say, binary) doesn't match the underlying type (fixed) then the check that verifies all columns can be satisfied fails, even though both requested type and actual type are valid.

We can fix this by adding logic to the `checkContains` methods in the Type classes, plus support in the converters. But I'm wondering if we shouldn't take a closer look at projection and schema evolution in general at this point.

Are there other ways to solve this problem? Can we do projection differently, so we don't have to ignore the physical type of a requested column in some cases? What are the rules for valid projection?

Thanks!

rb


--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to