spark error with reading parquet file created vis pandas/pyarrow

Brian Wylie Fri, 08 Sep 2017 11:38:10 -0700

Apologies if this isn't quite the right place to ask this question, but I
figured Wes/others might know right off the bat :)



Context:
- Mac OSX Laptop
- PySpark: 2.2.0
- PyArrow: 0.6.0
- Pandas: 0.19.2

Issue Explanation:
- I'm converting my Pandas dataframe to a Parquet file with code very
similar to
       - http://wesmckinney.com/blog/python-parquet-update/
- My Pandas DataFrame has a datetime index:  http_df.index.dtype =
dtype('<M8[ns]')
- When loading the saved parquet file I get the error below
- If I remove that index everything works fine

ERROR:
- Py4JJavaError: An error occurred while calling o34.parquet.
: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0
in stage 0.0 (TID 0, localhost, executor driver):
org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64
(TIMESTAMP_MICROS);

Full Code to reproduce:
 - https://github.com/Kitware/bat/blob/master/notebooks/Bro_to_Parquet.ipynb


Thanks in advance, also big fan of all this stuff... "be the chicken" :)

-Brian

spark error with reading parquet file created vis pandas/pyarrow

Reply via email to