mwong38 commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1059918680
After proto3 made everything optional, there is no way to know whether a primitive has been set or not. That is, you could no longer represent a "nullable" primitive. (They later brought `optional` keyword back, but the damage is done). The solution was to wrap a primitive in a message. For example, to represent a nullable `double`, there is the [DoubleValue](https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#doublevalue). It's analogous to Java's boxed `Double`. Parquet supports nullable primitives natively. If we were to represent these well-known Protobuf types directly, it will be nested inside a deeper data structure; very inconvenient and a waste of space. What I did here is "unwrap" these primitives and make use of Parquet's nullaibility. Additionally, I convert other well-known types such as Timestamp and Date to Parquet's. Again, rather than represent the data structure in its raw nested form (seconds/nanos or year, month, day, etc), they are converted to Parquet's native or logical representation of Timestamp and Date. You can turn on or off this features in the `ProtoSchemaConverter` constructor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org