[jira] [Commented] (SPARK-29234) bucketed table created by Spark SQL DataFrame is in SequenceFile format

Suchintak Patnaik (Jira) Tue, 24 Sep 2019 22:02:01 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-29234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937408#comment-16937408
 ]


Suchintak Patnaik commented on SPARK-29234:
-------------------------------------------

Is it possible to back port the PRs to version 2.3 as well??

> bucketed table created by Spark SQL DataFrame is in SequenceFile format
> -----------------------------------------------------------------------
>
>                 Key: SPARK-29234
>                 URL: https://issues.apache.org/jira/browse/SPARK-29234
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Suchintak Patnaik
>            Priority: Major
>
> When we create a bucketed table as follows, it's input and output format are 
> getting displayed as SequenceFile format. But physically the files are 
> getting created in HDFS as the format specified by the user e.g. 
> orc,parquet,etc.
> df.write.format("orc").bucketBy(4,"order_status").saveAsTable("OrdersExample")
> in Hive, DESCRIBE FORMATTED OrdersExample;
> describe formatted ordersExample;
> OK
> # col_name              data_type               comment
> col                     array<string>           from deserializer
> # Storage Information
> SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat:            org.apache.hadoop.mapred.SequenceFileInputFormat
> OutputFormat:           
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> Querying the same table in Hive is giving error.
> select * from OrdersExample;
> OK
> Failed with exception java.io.IOException:java.io.IOException: 
> hdfs://nn01.itversity.com:8020/apps/hive/warehouse/kuki.db/ordersexample/part-00000-55920574-eeb5-48b7-856d-e5c27e85ba12_00000.c000.snappy.orc
>  not a SequenceFile



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29234) bucketed table created by Spark SQL DataFrame is in SequenceFile format

Reply via email to