Hi Matei, Another thing occurred to me. Will the binary format you're writing sort the data in numeric order? Or would the decimals have to be decoded for comparison?
Cheers, Michael > On Oct 12, 2014, at 10:48 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: > > The fixed-length binary type can hold fewer bytes than an int64, though many > encodings of int64 can probably do the right thing. We can look into > supporting multiple ways to do this -- the spec does say that you should at > least be able to read int32s and int64s. > > Matei > > On Oct 12, 2014, at 8:20 PM, Michael Allman <mich...@videoamp.com> wrote: > >> Hi Matei, >> >> Thanks, I can see you've been hard at work on this! I examined your patch >> and do have a question. It appears you're limiting the precision of decimals >> written to parquet to those that will fit in a long, yet you're writing the >> values as a parquet binary type. Why not write them using the int64 parquet >> type instead? >> >> Cheers, >> >> Michael >> >> On Oct 12, 2014, at 3:32 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: >> >>> Hi Michael, >>> >>> I've been working on this in my repo: >>> https://github.com/mateiz/spark/tree/decimal. I'll make some pull requests >>> with these features soon, but meanwhile you can try this branch. See >>> https://github.com/mateiz/spark/compare/decimal for the individual commits >>> that went into it. It has exactly the precision stuff you need, plus some >>> optimizations for working on decimals. >>> >>> Matei >>> >>> On Oct 12, 2014, at 1:51 PM, Michael Allman <mich...@videoamp.com> wrote: >>> >>>> Hello, >>>> >>>> I'm interested in reading/writing parquet SchemaRDDs that support the >>>> Parquet Decimal converted type. The first thing I did was update the Spark >>>> parquet dependency to version 1.5.0, as this version introduced support >>>> for decimals in parquet. However, conversion between the catalyst decimal >>>> type and the parquet decimal type is complicated by the fact that the >>>> catalyst type does not specify a decimal precision and scale but the >>>> parquet type requires them. >>>> >>>> I'm wondering if perhaps we could add an optional precision and scale to >>>> the catalyst decimal type? The catalyst decimal type would have >>>> unspecified precision and scale by default for backwards compatibility, >>>> but users who want to serialize a SchemaRDD with decimal(s) to parquet >>>> would have to narrow their decimal type(s) by specifying a precision and >>>> scale. >>>> >>>> Thoughts? >>>> >>>> Michael >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>> >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org