[
https://issues.apache.org/jira/browse/HUDI-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-1570:
--------------------------------------
Description:
Many users want to understand what would be their avg record size in hudi
storage. They need this so that they can deduce their bloom config values.
As of now, there is no easy way to fetch record size for the end user. Even w/
hudi-cli, we could decipher from commit metadata, but we need to make some
rough calculation. So, it would be better if we store the avg record size w/
WriteStats (total bytes written/ total records written) , as well as in commit
metadata. So, in hudi_cli, we could expose this info along w/ "commit
showpartitions" or expose another command "commit showmetadata" or something.
As of now, we could calculate the avg size from bytes written/records written
from commit metadata.
!Screen Shot 2021-01-31 at 7.05.55 PM.png!
was:
Many users want to understand what would be their avg record size in hudi
storage. As of now, there is no easy way to fetch record size for the end user.
Even w/ hudi-cli, we could decipher from commit metadata, but we need to make
some rough calculation. So, it would be better if we store the avg record size
w/ WriteStats (total bytes written/ total records written) , as well as in
commit metadata. So, in hudi_cli, we could expose this info along w/ "commit
showpartitions" or expose another command "commit showmetadata" or something.
As of now, we could calculate the avg size from bytes written/records written
from commit metadata.
!Screen Shot 2021-01-31 at 7.05.55 PM.png!
> Add Avg record size in commit metadata
> --------------------------------------
>
> Key: HUDI-1570
> URL: https://issues.apache.org/jira/browse/HUDI-1570
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Utilities
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Major
> Attachments: Screen Shot 2021-01-31 at 7.05.55 PM.png
>
>
> Many users want to understand what would be their avg record size in hudi
> storage. They need this so that they can deduce their bloom config values.
> As of now, there is no easy way to fetch record size for the end user. Even
> w/ hudi-cli, we could decipher from commit metadata, but we need to make some
> rough calculation. So, it would be better if we store the avg record size w/
> WriteStats (total bytes written/ total records written) , as well as in
> commit metadata. So, in hudi_cli, we could expose this info along w/ "commit
> showpartitions" or expose another command "commit showmetadata" or something.
> As of now, we could calculate the avg size from bytes written/records written
> from commit metadata.
> !Screen Shot 2021-01-31 at 7.05.55 PM.png!
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)