[ https://issues.apache.org/jira/browse/ARROW-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16265989#comment-16265989 ]
Licht Takeuchi commented on ARROW-1436: --------------------------------------- [~wesmckinn] This seems already fixed. * Python code: {code:java} import pandas as pd import pyarrow as pa from pyarrow import parquet as pq import numpy as np t = pa.timestamp('ns') start = pd.Timestamp('2001-01-01').value data = np.array([start, start + 1000, start + 2000], dtype='int64') a = pa.array(data, type=t) table = pa.Table.from_arrays([a], ['ts']) pq.write_table(table, 'test-1.parquet', use_deprecated_int96_timestamps=True) pq.write_table(table, 'test-2.parquet', use_deprecated_int96_timestamps=False) {code} * Spark code: {code:java} import org.apache.parquet.hadoop.ParquetFileReader import org.apache.hadoop.fs.{FileSystem, Path} import org.apache.hadoop.conf.Configuration val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ val conf = sc.hadoopConfiguration val fs = FileSystem.get(conf) // int96 timestamp case ParquetFileReader.readAllFootersInParallel(conf, fs.getFileStatus(new Path("test-1.parquet"))) var df = sqlContext.read.parquet("../../../arrow/python/test-1.parquet") df.take(3) // int64 timestamp case ParquetFileReader.readAllFootersInParallel(conf, fs.getFileStatus(new Path("test-2.parquet"))) var df = sqlContext.read.parquet("../../../arrow/python/test-2.parquet") df.take(3) {code} > PyArrow Timestamps written to Parquet as INT96 appear in Spark as 'bigint' > -------------------------------------------------------------------------- > > Key: ARROW-1436 > URL: https://issues.apache.org/jira/browse/ARROW-1436 > Project: Apache Arrow > Issue Type: Bug > Components: Format, Python > Affects Versions: 0.6.0 > Reporter: Lucas Pickup > Assignee: Licht Takeuchi > Fix For: 0.8.0 > > > When using the 'use_deprecated_int96_timestamps' option to write Parquet > files compatible with Spark <2.2.0 (which doesn't support INT64 backed > Timestamps) Spark identifies the Timestamp columns as BigInts. Some metadata > may be missing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)