Parquet Spark cannot interface with Parquet Impala

Salman Ahmed Thu, 04 Aug 2016 07:35:30 -0700
> Dear All,
>
> We are facing the following issue:
>
> We have a Spark SQL Dataframe which contains a column of StringType.
>
> This column actually contains timestamp data.
>
> Because the dataframe doesn't support the format in which we want the
> timestamp, we are using string data type for it.
>
> All this is fine until,
> we would like to write this DataFrame to Impala Table which stores data in
> Parquet File Format.
>
> This is the code  for the last step:
>
>
>
>
> *df.write.save("hdfs://hadoop-test.css.org:8020/user/hive/warehouse/twitter.db/land2
> <http://hadoop-test.css.org:8020/user/hive/warehouse/twitter.db/land2>",
> format="parquet", mode="append", partitionBy=None)*
> Here is the error we are getting:
>
>
> *java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
> org.apache.hadoop.hive.serde2.io.TimestampWritable*
> When we change the format to "text"
>
>
>
> *df.write.save("hdfs://hadoop-test.css.org:8020/user/hive/warehouse/twitter.db/land2
> <http://hadoop-test.css.org:8020/user/hive/warehouse/twitter.db/land2>",
> format="text", mode="append", partitionBy=None),*
> we get no errors & the write is successful (provided the Impala Table is
> also in TextFile format.
>
>
> Why is this. We tried everything including:
>
> http://stackoverflow.com/questions/31482798/save-spark-dataframe-to-hive-table-not-readable-because-parquet-not-a-sequence
>
> This gives us an error:
> TypeError: 'JavaPackage' object is not callable
>
>
> Any help would be deeply appreciated.
>
> Thanks & Kind Regards,
> Salman Ahmed
>
>
Parquet Spark cannot interface with Parquet Impala

Reply via email to