Re: Partition parquet data by ENUM column

2015-07-24 Thread Cheng Lian
Could you please provide the full stack trace of the exception? And what's the Git commit hash of the version you were using? Cheng On 7/24/15 6:37 AM, Jerry Lam wrote: Hi Cheng, I ran into issues related to ENUM when I tried to use Filter push down. I'm using Spark 1.5.0 (which contains

Re: Partition parquet data by ENUM column

2015-07-24 Thread Cheng Lian
Your guess is partly right. Firstly, Spark SQL doesn’t have an equivalent data type to Parquet BINARY (ENUM), and always falls back to normal StringType. So in your case, Spark SQL just see a StringType, which maps to Parquet BINARY (UTF8), but the underlying data type is BINARY (ENUM).

Re: Partition parquet data by ENUM column

2015-07-21 Thread Cheng Lian
On 7/22/15 9:03 AM, Ankit wrote: Thanks a lot Cheng. So it seems even in spark 1.3 and 1.4, parquet ENUMs were treated as Strings in Spark SQL right? So does this mean partitioning for enums already works in previous versions too since they are just treated as strings? It’s a little bit

Re: Partition parquet data by ENUM column

2015-07-21 Thread Ankit
Thanks a lot Cheng. So it seems even in spark 1.3 and 1.4, parquet ENUMs were treated as Strings in Spark SQL right? So does this mean partitioning for enums already works in previous versions too since they are just treated as strings? Also, is there a good way to verify that the partitioning is

Re: Partition parquet data by ENUM column

2015-07-21 Thread Cheng Lian
Parquet support for Thrift/Avro/ProtoBuf ENUM types are just added to the master branch. https://github.com/apache/spark/pull/7048 ENUM types are actually not in the Parquet format spec, that's why we didn't have it at the first place. Basically, ENUMs are always treated as UTF8 strings in