Yun,

We've defined how decimals should be stored in the spec [1], which is how some clients, like Hive, are already implementing Decimal. That uses a per-column scale, which is what SQL requires and, I think, what most people use. (We can work on a spec for per-value scale if you need it.)

It is up to each object model (e.g., avro, protobuf, thrift, hive) to implement support for decimal. I'm not sure how that would be done in protobuf, but we certainly welcome contributions to make it happen. I've also built support in parquet-avro, which I will get posted as a PR now that the upstream Avro 1.8.0 release is out. Maybe that will help you.

rb


[1]: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

On 01/29/2016 08:44 PM, Yun Liaw wrote:
hi folks,

Now I am working on transforming a protobuf object which is a custom
message that wrapped the java's BigDecimal into the decimal type of parquet.

The custom message is something like this:

message BDecimal {
   required int32 scale = 1;
   required BInteger int_val = 2;}

message BInteger {
   required bytes value = 1;
}

ref: 
http://stackoverflow.com/questions/1051732/what-is-the-best-approach-for-serializing-bigdecimal-biginteger-to-protocolbuffe

I have tried to use parquet-proto to transform the protobuf object to
parquet, but it would transform the parquet with the same schema as above.
But what I want to do is whenever I encounter this BDecimal in protobuf, it
can be transformed to decimal type in parquet.

Any suggestions of doing this?

Best,
Yun



--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to