mwong38 commented on pull request #900:
URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1059918680


   After proto3 made everything optional, there is no way to know whether a 
primitive has been set or not. That is, you could no longer represent a 
"nullable" primitive. (They later brought `optional` keyword back, but the 
damage is done). The solution was to wrap a primitive in a message. For 
example, to represent a nullable `double`, there is the 
[DoubleValue](https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#doublevalue).
 It's analogous to Java's boxed `Double`. 
   
   Parquet supports nullable primitives natively. If we were to represent these 
well-known Protobuf types directly, it will be nested inside a deeper data 
structure; very inconvenient and a waste of space. What I did here is "unwrap" 
these primitives and make use of Parquet's nullaibility.
   
   Additionally, I convert other well-known types such as Timestamp and Date to 
Parquet's. Again, rather than represent the data structure in its raw nested 
form (seconds/nanos or year, month, day, etc), they are converted to Parquet's 
native or logical representation of Timestamp and Date. You can turn on or off 
this features in the `ProtoSchemaConverter` constructor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to