Re: reading/writing parquet decimal type

Matei Zaharia Sun, 12 Oct 2014 22:49:55 -0700

The fixed-length binary type can hold fewer bytes than an int64, though many 
encodings of int64 can probably do the right thing. We can look into supporting 
multiple ways to do this -- the spec does say that you should at least be able 
to read int32s and int64s.


Matei

On Oct 12, 2014, at 8:20 PM, Michael Allman <mich...@videoamp.com> wrote:

> Hi Matei,
> 
> Thanks, I can see you've been hard at work on this! I examined your patch and 
> do have a question. It appears you're limiting the precision of decimals 
> written to parquet to those that will fit in a long, yet you're writing the 
> values as a parquet binary type. Why not write them using the int64 parquet 
> type instead?
> 
> Cheers,
> 
> Michael
> 
> On Oct 12, 2014, at 3:32 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
> 
>> Hi Michael,
>> 
>> I've been working on this in my repo: 
>> https://github.com/mateiz/spark/tree/decimal. I'll make some pull requests 
>> with these features soon, but meanwhile you can try this branch. See 
>> https://github.com/mateiz/spark/compare/decimal for the individual commits 
>> that went into it. It has exactly the precision stuff you need, plus some 
>> optimizations for working on decimals.
>> 
>> Matei
>> 
>> On Oct 12, 2014, at 1:51 PM, Michael Allman <mich...@videoamp.com> wrote:
>> 
>>> Hello,
>>> 
>>> I'm interested in reading/writing parquet SchemaRDDs that support the 
>>> Parquet Decimal converted type. The first thing I did was update the Spark 
>>> parquet dependency to version 1.5.0, as this version introduced support for 
>>> decimals in parquet. However, conversion between the catalyst decimal type 
>>> and the parquet decimal type is complicated by the fact that the catalyst 
>>> type does not specify a decimal precision and scale but the parquet type 
>>> requires them.
>>> 
>>> I'm wondering if perhaps we could add an optional precision and scale to 
>>> the catalyst decimal type? The catalyst decimal type would have unspecified 
>>> precision and scale by default for backwards compatibility, but users who 
>>> want to serialize a SchemaRDD with decimal(s) to parquet would have to 
>>> narrow their decimal type(s) by specifying a precision and scale.
>>> 
>>> Thoughts?
>>> 
>>> Michael
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>> 
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: reading/writing parquet decimal type

Reply via email to