[jira] [Commented] (SPARK-14463) read.text broken for partitioned tables

Jurriaan Pruis (JIRA) Mon, 18 Apr 2016 11:11:10 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246211#comment-15246211
 ]


Jurriaan Pruis commented on SPARK-14463:
----------------------------------------

Why? I guess this can be quite useful, at least while reading them (I've got 
some partitioned text files, and want to be able to quickly filter them based 
on the partitions before processing them any further). This kinda worked with 
Spark 1.6.x but having some problems when trying to work with the partition 
values themselves (SPARK-14343).

> read.text broken for partitioned tables
> ---------------------------------------
>
>                 Key: SPARK-14463
>                 URL: https://issues.apache.org/jira/browse/SPARK-14463
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Michael Armbrust
>            Priority: Critical
>
> Strongly typing the return values of {{read.text}} as {{Dataset\[String]}} 
> breaks when trying to load a partitioned table (or any table where the path 
> looks partitioned)
> {code}
> Seq((1, "test"))
>   .toDF("a", "b")
>   .write
>   .format("text")
>   .partitionBy("a")
>   .save("/home/michael/text-part-bug")
> sqlContext.read.text("/home/michael/text-part-bug")
> {code}
> {code}
> org.apache.spark.sql.AnalysisException: Try to map struct<value:string,a:int> 
> to Tuple1, but failed as the number of fields does not line up.
>  - Input schema: struct<value:string,a:int>
>  - Target schema: struct<value:string>;
>       at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.org$apache$spark$sql$catalyst$encoders$ExpressionEncoder$$fail$1(ExpressionEncoder.scala:265)
>       at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.validate(ExpressionEncoder.scala:279)
>       at org.apache.spark.sql.Dataset.<init>(Dataset.scala:197)
>       at org.apache.spark.sql.Dataset.<init>(Dataset.scala:168)
>       at org.apache.spark.sql.Dataset$.apply(Dataset.scala:57)
>       at org.apache.spark.sql.Dataset.as(Dataset.scala:357)
>       at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:450)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-14463) read.text broken for partitioned tables

Reply via email to