deep-teliacompany opened a new issue #3132:
URL: https://github.com/apache/hudi/issues/3132
HUDI table returns no records when queried using Hive with partition columns
in where clause. While same query returns expected records perfectly in Impala
Hudi Table created in hive-
CREATE EXTERNAL TABLE t_test_hudi (
`_hoodie_commit_time` string,
`_hoodie_commit_seqno` string,
`_hoodie_record_key` string,
`_hoodie_partition_path` string,
`_hoodie_file_name` string,
`DagName` string,
`sequence_number` int,
`dt_timestamp` date,
cdl_ingest_time string)
PARTITIONED BY (
`ing_year` int,
`ing_month` int,
`ing_day` int
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'/data/test/swe/base/cusin_gsmahs/mobile_customer/t_cusin_customer_d1/';
While below query works fine -
"Select * from t_test_hudi"
Expected behavior
Query -"Select * from t_test_hudi where ing_year=2021 and ing_month=4 and
ing_day=4"
should return records
Environment Description
Hudi version : 0.8.0
Spark version : 2.4.3
Hive version : 3.1
Hadoop version : Distribution CDH-7.1.4
Storage (HDFS/S3/GCS..) : HDFS
Running on Docker? (yes/no) : No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]