[GitHub] [hudi] Zhangshunyu opened a new issue, #7032: [SUPPORT] When metatable enabled some query result will be empty

GitBox Sat, 22 Oct 2022 02:21:46 -0700


Zhangshunyu opened a new issue, #7032:
URL: https://github.com/apache/hudi/issues/7032


   When we enable metadata table,   we use "id, t" as stats column and dataskip 
is enabled, we get some id values from table (both values exist in table) as 
filter to query details, but we find that some id will get result but some will 
be empty, the query like following:
   select * from table_a where id in ('id001');
   select * from table_a where id in ('id002');
   both 'id001' and 'id002' exist, but 'id001' can get result , but 'id002' get 
empty result.
   by the way, we find the candidate files after  index filter applied is empty 
for 'id002', it seems the MIN/MAX values has some problem in metatable?
   our config as following: 
   
   hudi 0.11
   spark 3.1.1
   
   DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "years,months,days",
        "hoodie.sql.insert.mode" ->  "non-strict", 
              "hoodie.bulkinsert.sort.mode" -> "GLOBAL_SORT",
              "hoodie.metadata.enable" -> "true",
              "hoodie.bulkinsert.shuffle.parallelism" -> "300",  
                  "hoodie.parquet.max.file.size" -> "134217728", 
                  "hoodie.parquet.compression.codec" -> "snappy", 
                   "hoodie.parquet.dictionary.enabled" -> "false",
              "hoodie.metadata.index.column.stats.enable"  -> "true",
              "hoodie.enable.data.skipping" -> "true",
              "hoodie.cleaner.policy.failed.writes" -> "LAZY",
        "hoodie.clean.automatic" -> "false",
         "hoodie.metadata.index.column.stats.column.list" ->"id, t",
         "hoodie.metadata.index.column.stats.file.group.count" -> "10",
          "hoodie.metadata.clean.async" -> "true",
          "hoodie.metadata.compact.max.delta.commits" -> "4")


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] Zhangshunyu opened a new issue, #7032: [SUPPORT] When metatable enabled some query result will be empty

Reply via email to