[jira] [Commented] (SPARK-9807) pyspark.sql.createDataFrame does not infer data type of parsed TSV

Yanbo Liang (JIRA) Tue, 25 Aug 2015 23:37:58 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-9807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712592#comment-14712592
 ]


Yanbo Liang commented on SPARK-9807:
------------------------------------

The document is correct. It said the type of each column can be inferred from 
the data which is a RDD.
But in your case, you load text data to RDD and then the RDD is list(string) 
type. So you get the DataFrame with all columns with string type and it 
conforms to expectation.

> pyspark.sql.createDataFrame does not infer data type of parsed TSV
> ------------------------------------------------------------------
>
>                 Key: SPARK-9807
>                 URL: https://issues.apache.org/jira/browse/SPARK-9807
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.4.1
>         Environment: CentOS 6, Python version 2.7.10, Scala version 2-10 
>            Reporter: Karen Yin-Yee Ng
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I tried parsing a space-separated file from HDFS.
> And using `pyspark.sqlContext.createDataFrame` to convert the parsed lines to 
> a PySpark DataFrame. However, all entries are parsed as string type 
> regardless of what the correct data type is.
> An example of my code and output can be found at:
> https://gist.github.com/karenyyng/a1264d6344c54df4fcc5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-9807) pyspark.sql.createDataFrame does not infer data type of parsed TSV

Reply via email to