Possibly snapshots. Also possible if each of your files has lots of xattr or ACLs.
On Tue, Dec 1, 2020 at 3:52 PM Jason Wen <zhenshan....@workday.com.invalid> wrote: > Hi, > > We are encountering some odd FSImage size issue in one of our Hadoop > clusters. The Namenode only has about 3M files/blocks, but the FSImage size > is about 55GB. > We have never seen this kind of gap between number of files/blocks vs > FSImage size. As a comparison, we have another similar cluster which also > has ~3M files/blocks, but the FSImage size is only ~1GB. We also have > another cluster that has ~200M files/blocks but the FSImage size is only > ~45GB. > > My understanding is the FSImage size or the heap memory usage of Namenode > is mostly determined by the number of files/blocks. The gap that we > observed seems caused by other factors in Namenode FSImage/Namespace. > > Can anyone shed the light what could cause this FSImage issue? > > Thanks, > Jason >