BalaMahesh opened a new issue #2203:
URL: https://github.com/apache/hudi/issues/2203


   **Describe the problem you faced**
   
   Hive query  for some partitions on the HUDI table with partition column in 
where condition is returning no result. I have verified partitions by using 
show partitions, desc formatted etc.,
   
   I am also able to see the  .hoodie_partititon_metadata file and parquet file 
in the table partition directory. By using the parquet-tools , i did cat on the 
file and it has exactly one ingested event. 
   
   select count(*),dt from _ro table group by dt; : This query returns the 
count as 1 inside that partition (y)
   
   select * from _ro where id=x; (x in the partition y)
   
   but when i do
   
   select * from _ro where dt="y",  it returns empty result but for other dt 
value it returns results.
   
   I am not sure where the exact issue is, is it because the file size is small 
and it has only record or  if hive is behaving miscellaneously . I have seen 
the query logs and it shows numFiles = 1 , numSplits=1.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Ingesting records using HoodieDeltaStreamer from JsonKafka Source
   2. Partitioning the data based on date field in (yyyy-MM-dd) format
   3. Querying the _ro table.
   
   
   **Expected behavior**
   
   It should return the single row
   
   **Environment Description**
   
   * Hudi version : 0.6.1
   
   * Spark version : 2.4.7
   
   * Hive version : 1.2
   
   * Hadoop version : 2.7.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to