[ https://issues.apache.org/jira/browse/SPARK-29234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937408#comment-16937408 ]
Suchintak Patnaik commented on SPARK-29234: ------------------------------------------- Is it possible to back port the PRs to version 2.3 as well?? > bucketed table created by Spark SQL DataFrame is in SequenceFile format > ----------------------------------------------------------------------- > > Key: SPARK-29234 > URL: https://issues.apache.org/jira/browse/SPARK-29234 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Suchintak Patnaik > Priority: Major > > When we create a bucketed table as follows, it's input and output format are > getting displayed as SequenceFile format. But physically the files are > getting created in HDFS as the format specified by the user e.g. > orc,parquet,etc. > df.write.format("orc").bucketBy(4,"order_status").saveAsTable("OrdersExample") > in Hive, DESCRIBE FORMATTED OrdersExample; > describe formatted ordersExample; > OK > # col_name data_type comment > col array<string> from deserializer > # Storage Information > SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > Querying the same table in Hive is giving error. > select * from OrdersExample; > OK > Failed with exception java.io.IOException:java.io.IOException: > hdfs://nn01.itversity.com:8020/apps/hive/warehouse/kuki.db/ordersexample/part-00000-55920574-eeb5-48b7-856d-e5c27e85ba12_00000.c000.snappy.orc > not a SequenceFile -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org