[jira] [Updated] (SPARK-13141) Dataframe created from Hive partitioned tables using HiveContext returns wrong results

Simone (JIRA) Tue, 02 Feb 2016 01:20:07 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Simone updated SPARK-13141:
---------------------------
    Description: 
I get wrong dataframe results using HiveContext with Spark 1.5.0 on CDH 5.5.1 
in yarn-client mode.

The problem occurs with partitioned tables on text delimited HDFS data, both 
with Scala and Python.

This an example code:
import org.apache.spark.sql.hive.HiveContext
val hc = new HiveContext(sc)
hc.table("my_db.partition_table").show()

The result is that all values of all rows are NULL, except from the first 
column (that contains the whole line of data) and the partitioning columns, 
which appears to be correct.

With Hive and Impala I get correct results.

I think that similar problems occurs also with Avro data:
https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Pyspark-Table-Dataframe-returning-empty-records-from-Partitioned/td-p/35836

  was:
I get wrong dataframe result using HiveContext with Spark 1.5.0 on CDH 5.5.1 in 
yarn-client mode.

The problem occurs with partitioned tables on text delimited HDFS data, both 
with Scala and Python.

This an example code:
import org.apache.spark.sql.hive.HiveContext
val hc = new HiveContext(sc)
hc.table("my_db.partition_table").show()

The result is that all values of all rows are NULL, except from the first 
column (that contains the whole line of data) and the partitioning columns, 
which appears to be correct.

With Hive and Impala I get correct results.

I think that similar problems occurs also with Avro data:
https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Pyspark-Table-Dataframe-returning-empty-records-from-Partitioned/td-p/35836


> Dataframe created from Hive partitioned tables using HiveContext returns 
> wrong results
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-13141
>                 URL: https://issues.apache.org/jira/browse/SPARK-13141
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>         Environment: CDH 5.5.1
>            Reporter: Simone
>            Priority: Critical
>
> I get wrong dataframe results using HiveContext with Spark 1.5.0 on CDH 5.5.1 
> in yarn-client mode.
> The problem occurs with partitioned tables on text delimited HDFS data, both 
> with Scala and Python.
> This an example code:
> import org.apache.spark.sql.hive.HiveContext
> val hc = new HiveContext(sc)
> hc.table("my_db.partition_table").show()
> The result is that all values of all rows are NULL, except from the first 
> column (that contains the whole line of data) and the partitioning columns, 
> which appears to be correct.
> With Hive and Impala I get correct results.
> I think that similar problems occurs also with Avro data:
> https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Pyspark-Table-Dataframe-returning-empty-records-from-Partitioned/td-p/35836



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13141) Dataframe created from Hive partitioned tables using HiveContext returns wrong results

Reply via email to