Apologies if this isn't quite the right place to ask this question, but I figured Wes/others might know right off the bat :)
Context: - Mac OSX Laptop - PySpark: 2.2.0 - PyArrow: 0.6.0 - Pandas: 0.19.2 Issue Explanation: - I'm converting my Pandas dataframe to a Parquet file with code very similar to - http://wesmckinney.com/blog/python-parquet-update/ - My Pandas DataFrame has a datetime index: http_df.index.dtype = dtype('<M8[ns]') - When loading the saved parquet file I get the error below - If I remove that index everything works fine ERROR: - Py4JJavaError: An error occurred while calling o34.parquet. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (TIMESTAMP_MICROS); Full Code to reproduce: - https://github.com/Kitware/bat/blob/master/notebooks/Bro_to_Parquet.ipynb Thanks in advance, also big fan of all this stuff... "be the chicken" :) -Brian