[
https://issues.apache.org/jira/browse/SPARK-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Davies Liu updated SPARK-11657:
-------------------------------
Fix Version/s: 1.6.0
1.5.3
> Bad Dataframe data read from parquet
> ------------------------------------
>
> Key: SPARK-11657
> URL: https://issues.apache.org/jira/browse/SPARK-11657
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 1.5.1, 1.5.2
> Environment: EMR (yarn)
> Reporter: Virgil Palanciuc
> Priority: Critical
> Fix For: 1.5.3, 1.6.0
>
> Attachments: sample.tgz
>
>
> I get strange behaviour when reading parquet data:
> {code}
> scala> val data = sqlContext.read.parquet("hdfs:///sample")
> data: org.apache.spark.sql.DataFrame = [clusterSize: int, clusterName:
> string, clusterData: array<string>, dpid: int]
> scala> data.take(1) /// this returns garbage
> res0: Array[org.apache.spark.sql.Row] =
> Array([1,56169A947F000101????????,WrappedArray(164594606101815510825479776971????????),813])
>
> scala> data.collect() /// this works
> res1: Array[org.apache.spark.sql.Row] =
> Array([1,6A01CACD56169A947F000101,WrappedArray(77512098164594606101815510825479776971),813])
> {code}
> I've attached the "hdfs:///sample" directory to this bug report
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]