Hey Cheng, I didn't meant that catalyst casting was eager, just that my approaches thus far seem to have been. Maybe I should give a concrete example?
I have columns A, B, C where B is saved as a String but I'd like all references to B to go through a Cast to decimal regardless of the code used on the SchemaRDD. So if someone does a min(B) it uses Decimal ordering instead of String. One approach that I had taken was to do a select of everything with the casts on certain columns, but then when I did a count(literal(1)) on top of that RDD it seemed to bring in the whole row. Thanks! -Pat On Sat, Mar 28, 2015 at 11:35 AM, Cheng Lian <lian.cs....@gmail.com> wrote: > Hi Pat, > > I don't understand what "lazy casting" mean here. Why do you think current > Catalyst casting is "eager"? Casting happens at runtime, and doesn't > disable column pruning. > > Cheng > > > On 3/28/15 11:26 PM, Patrick Woody wrote: > >> Hi all, >> >> In my application, we take input from Parquet files where BigDecimals are >> written as Strings to maintain arbitrary precision. >> >> I was hoping to convert these back over to Decimal with Unlimited >> precision, but I'd still like to maintain the Parquet column pruning (all >> my attempts thus far seem to bring in the whole Row). Is it possible to do >> this lazily through catalyst? >> >> Basically I'd want to do Cast(col, DecimalType()) whenever col is actually >> referenced. Any tips on how to approach this would be appreciated. >> >> Thanks! >> -Pat >> >> >