[
https://issues.apache.org/jira/browse/KUDU-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yingchun Lai resolved KUDU-3318.
--------------------------------
Fix Version/s: 1.16.0
Resolution: Fixed
> Log Block Container metadata consumed too much disk space
> ---------------------------------------------------------
>
> Key: KUDU-3318
> URL: https://issues.apache.org/jira/browse/KUDU-3318
> Project: Kudu
> Issue Type: Improvement
> Components: fs
> Reporter: Yingchun Lai
> Priority: Major
> Fix For: 1.16.0
>
>
> In log block container, blocks in .data file are append only, there is a
> related append only .metadata file to trace blocks in .data, this type of
> entries in metadata are in CREATE type, the other type of entries in metadata
> are type of DELETE, it means mark the corresponding CREATE block as deleted.
> If there is a pair of CREATE and DELETE entries of a same block id, LBM use
> hole punch to reclaim disk space in .data file, but the entries in .metadata
> will not be compacted except bootstrap.
> Another way to limit metadata is the .data file offset reach its size
> limitation(default 10GB), or block number in metadata reach its limitation(no
> limit on default).
> I found a case in product environment that metadata consumed too many disk
> space and near to .data's disk space, it's a waste, and make users confused
> and complain that the actual disk space is far more than user's data.
>
> {code:java}
> [root@hybrid01 data]# du -cs *.metadata | sort -n | tail
> 19072 fb58e00979914e95aae7184e3189c8c6.metadata
> 19092 5bbf54294d5948c4a695e240e81d5f80.metadata
> 19168 89da5f3c4dfa469a9935f091bced1856.metadata
> 19200 f27e6ff14bd44fd1838f63f1be35ee64.metadata
> 19256 7b87a5e3c7fa4d3d86dcd3945d6741e1.metadata
> 19256 cf054d1aa7cb4f5cbbbce3b99189bbe1.metadata
> 19496 a6cbb4a284b842deafe6939be051c77c.metadata
> 19568 ba749640df684cb8868d6e51ea3d1b17.metadata
> 19924 e5469080934746e58b0fd2ba29d69c9d.metadata
> 148954280 total // all metadata size ~149GB
> [root@hybrid01 data]# du -cs *.data | sort -n | tail
> 64568 46dfbc5ac94d429b8d79a536727495df.data
> 64568 b4abc59d4eb2473ca267e0b057c8fad7.data
> 65728 576e09ed7e164ddebe5b1702be296619.data
> 66368 88d295f38dec4197bfbc6927e0528bde.data
> 90904 7291e10aafe74f2792168f6146738c5d.data
> 96788 6e72381ae95840f99864baacbc9169af.data
> 98060 c413553491764d039e702577606bac02.data
> 103556 a5db7a9c2e93457aa06103e45f59d8b4.data
> 138200 3876af02694643d49b19b39789460759.data
> 176443948 total // all data size ~176GB
> [root@hybrid01 data]# kudu pbc dump e5469080934746e58b0fd2ba29d69c9d.metadata
> --oneline | awk '{print $5}' | sort | uniq -c | egrep -v " 2 "
> 1 6165611810 // low live ratio, only 1 live block
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)