[jira] [Updated] (HIVE-26888) Hive gives empty results with partition column filter for hive parquet table when data loaded through spark dataframe

Indhumathi Muthumurugesh (Jira) Fri, 23 Dec 2022 04:40:44 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-26888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Indhumathi Muthumurugesh updated HIVE-26888:
--------------------------------------------
    Description: 
1. From Spark sql:

create table test(a int, b string) partitioned by (c string) stored as parquet;

insert into test select 1,'abc','part1';

 

2. Use spark dataframe to generate new parquet file

val df = spark.sql("select * from test");

df.write.mode("overwrite").parquet("/Users/indhu/Downloads/part=part1");

 

3. From hive, create a external table with parquet format and add partition, 
with the location

create external table test(a int, b string) partitioned by (c string) 

ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 

STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 

OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

alter table test add partition(part='part1') location 
'/Users/indhu/Downloads/part=part1';

select * from test where part='part1';

> Hive gives empty results with partition column filter for hive parquet table 
> when data loaded through spark dataframe
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26888
>                 URL: https://issues.apache.org/jira/browse/HIVE-26888
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Indhumathi Muthumurugesh
>            Priority: Major
>
> 1. From Spark sql:
> create table test(a int, b string) partitioned by (c string) stored as 
> parquet;
> insert into test select 1,'abc','part1';
>  
> 2. Use spark dataframe to generate new parquet file
> val df = spark.sql("select * from test");
> df.write.mode("overwrite").parquet("/Users/indhu/Downloads/part=part1");
>  
> 3. From hive, create a external table with parquet format and add partition, 
> with the location
> create external table test(a int, b string) partitioned by (c string) 
> ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
> STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
> alter table test add partition(part='part1') location 
> '/Users/indhu/Downloads/part=part1';
> select * from test where part='part1';



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26888) Hive gives empty results with partition column filter for hive parquet table when data loaded through spark dataframe

Reply via email to