There should be a centralized place where type equivalence and conversion are defined. Then converters could reuse them and we would minimize the amount of work required. When projecting, parquet deserializes the physical types it knows about and the converter uses the proper type conversion. This could be implemented as a set of reusable PrimitiveConverters that know how to convert from a given physical type to a logical type. they can be composed with the appropriate converter if there's a more specific type for a particular framework.
On Mon, May 18, 2015 at 1:43 PM, Ryan Blue <[email protected]> wrote: > I've been looking at schema evolution lately, and we don't currently > support changing physical types when a logical type does not change. This > could be a problem when two different systems have different, but valid, > representations for a logical type. > > Decimal, for example, can be represented either with a binary or a fixed. > But if the requested schema for a file (say, binary) doesn't match the > underlying type (fixed) then the check that verifies all columns can be > satisfied fails, even though both requested type and actual type are valid. > > We can fix this by adding logic to the `checkContains` methods in the Type > classes, plus support in the converters. But I'm wondering if we shouldn't > take a closer look at projection and schema evolution in general at this > point. > > Are there other ways to solve this problem? Can we do projection > differently, so we don't have to ignore the physical type of a requested > column in some cases? What are the rules for valid projection? > > Thanks! > > rb > > > -- > Ryan Blue > Software Engineer > Cloudera, Inc. >
