bvaradar commented on issue #2066: URL: https://github.com/apache/hudi/issues/2066#issuecomment-687274706
In this case, you are using complex key combining 5 columns. From what I have seen with user deployments, this is very unusual (most common case being 1 or 2 columns). Having materialized record key has its benefits. That being said, this could be because the individual columns themselves would have been highly compressible but not the concatenation of them. The other factor being what is the proportion of columns that constitute a record key. You can try using parquet tools to see column/block level stats on both parquet and hudi files to get more insights. BTW, if you have noticed the dev@ , user@ community emails, there is work happening on making the record_key virtual. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
