[jira] [Commented] (ARROW-1436) PyArrow Timestamps written to Parquet as INT96 appear in Spark as 'bigint'

Licht Takeuchi (JIRA) Sun, 26 Nov 2017 04:45:36 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16265991#comment-16265991
 ]


Licht Takeuchi commented on ARROW-1436:
---------------------------------------

Here is the old [email protected] result.

{code:java}
scala> // int96 timestamp case

scala> ParquetFileReader.readAllFootersInParallel(conf, fs.getFileStatus(new 
Path("test-1.parquet")))
res0: java.util.List[org.apache.parquet.hadoop.Footer] =
[Footer{file:/Users/rito/GitHub/arrow/python/test-1.parquet, 
ParquetMetaData{FileMetaData{schema: message schema {
  optional int96 ts;
}
, metadata: {}}, blocks: [BlockMetaData{3, 104 [ColumnMetaData{SNAPPY [ts] 
INT96  [PLAIN_DICTIONARY, RLE, PLAIN], 47}]}]}}]

scala> var df = sqlContext.read.parquet("test-1.parquet")
df: org.apache.spark.sql.DataFrame = [ts: timestamp]

scala> df.take(3)
res1: Array[org.apache.spark.sql.Row] = Array([2001-01-01 09:00:00.0], 
[2001-01-01 09:00:00.000001], [2001-01-01 09:00:00.000002])
{code}


> PyArrow Timestamps written to Parquet as INT96 appear in Spark as 'bigint'
> --------------------------------------------------------------------------
>
>                 Key: ARROW-1436
>                 URL: https://issues.apache.org/jira/browse/ARROW-1436
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Format, Python
>    Affects Versions: 0.6.0
>            Reporter: Lucas Pickup
>            Assignee: Licht Takeuchi
>             Fix For: 0.8.0
>
>
> When using the 'use_deprecated_int96_timestamps' option to write Parquet 
> files compatible with Spark <2.2.0 (which doesn't support INT64 backed 
> Timestamps) Spark identifies the Timestamp columns as BigInts. Some metadata 
> may be missing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ARROW-1436) PyArrow Timestamps written to Parquet as INT96 appear in Spark as 'bigint'

Reply via email to