[ 
https://issues.apache.org/jira/browse/HUDI-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1570:
--------------------------------------
    Description: 
Many users want to understand what would be their avg record size in hudi 
storage. As of now, there is no easy way to fetch record size for the end user. 
Even w/ hudi-cli, we could decipher from commit metadata, but we need to make 
some rough calculation. So, it would be better if we store the avg record size 
w/ WriteStats (total bytes written/ total records written) , as well as in 
commit metadata. So, in hudi_cli, we could expose this info along w/ "commit 
showpartitions" or expose another command "commit showmetadata" or something. 

As of now, we could calculate the avg size from bytes written/records written 
from commit metadata. 

!Screen Shot 2021-01-31 at 7.05.55 PM.png!

 

 

  was:
Many users want to understand what would be their avg record size. As of now, 
there is no easy way to fetch record size for the end user. Even w/ hudi-cli, 
we could decipher from commit metadata, but we need to make some rough 
calculation. So, it would be better if we store the avg record size w/ 
WriteStats (total bytes written/ total records written) , as well as in commit 
metadata. So, in hudi_cli, we could expose this info along w/ "commit 
showpartitions" or expose another command "commit showmetadata" or something. 

As of now, we could calculate the avg size from bytes written/records written 
from commit metadata. 

!Screen Shot 2021-01-31 at 7.05.55 PM.png!

 

 


> Add Avg record size in commit metadata
> --------------------------------------
>
>                 Key: HUDI-1570
>                 URL: https://issues.apache.org/jira/browse/HUDI-1570
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Utilities
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>         Attachments: Screen Shot 2021-01-31 at 7.05.55 PM.png
>
>
> Many users want to understand what would be their avg record size in hudi 
> storage. As of now, there is no easy way to fetch record size for the end 
> user. Even w/ hudi-cli, we could decipher from commit metadata, but we need 
> to make some rough calculation. So, it would be better if we store the avg 
> record size w/ WriteStats (total bytes written/ total records written) , as 
> well as in commit metadata. So, in hudi_cli, we could expose this info along 
> w/ "commit showpartitions" or expose another command "commit showmetadata" or 
> something. 
> As of now, we could calculate the avg size from bytes written/records written 
> from commit metadata. 
> !Screen Shot 2021-01-31 at 7.05.55 PM.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to