Could you please provide the full stack trace of the exception? And
what's the Git commit hash of the version you were using?
Cheng
On 7/24/15 6:37 AM, Jerry Lam wrote:
Hi Cheng,
I ran into issues related to ENUM when I tried to use Filter push
down. I'm using Spark 1.5.0 (which contains
Your guess is partly right. Firstly, Spark SQL doesn’t have an
equivalent data type to Parquet BINARY (ENUM), and always falls back to
normal StringType. So in your case, Spark SQL just see a StringType,
which maps to Parquet BINARY (UTF8), but the underlying data type is
BINARY (ENUM).
On 7/22/15 9:03 AM, Ankit wrote:
Thanks a lot Cheng. So it seems even in spark 1.3 and 1.4, parquet
ENUMs were treated as Strings in Spark SQL right? So does this mean
partitioning for enums already works in previous versions too since
they are just treated as strings?
It’s a little bit
Thanks a lot Cheng. So it seems even in spark 1.3 and 1.4, parquet ENUMs
were treated as Strings in Spark SQL right? So does this mean partitioning
for enums already works in previous versions too since they are just
treated as strings?
Also, is there a good way to verify that the partitioning is
Parquet support for Thrift/Avro/ProtoBuf ENUM types are just added to
the master branch. https://github.com/apache/spark/pull/7048
ENUM types are actually not in the Parquet format spec, that's why we
didn't have it at the first place. Basically, ENUMs are always treated
as UTF8 strings in