bvaradar commented on issue #2066: URL: https://github.com/apache/hudi/issues/2066#issuecomment-687831117
@n3nash @modi95 : Can you comment on the timelines. @KarthickAN : Honestly, this was not raised as a major issue until recently. Having a materialized record key allows you to standardize on data model. If I were to try improve the compression, I would try to change the ordering of the keys such that low cardinality columns is in the prefix(currently - 'sourceId,sourceAssetId,timestamp,sourceSignalId,aggregation' ) to see if compression gets better. Also, we have seen gzip compression (default for hudi) outperforming snappy in compression size. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
