Thanks Yuanbo for your response. Since the snapshot、symbol link is not popular in HADOOP
Actually, Snapshot and Symbol are both enabled by many companies. I don't have information if the 'reserved' feature is also used. we can try to use global lock(write lock of root inode?) I think there are two sides to consider: a. When and How are we aware the request path includes Snapshot/ Symbol (suppose only directory tree request, block request will be more difficult.) or not, b. No matter whether to lock root or lock INode which with Snapshot/Symbol features will involve some pros and cons. Not thinking carefully, maybe there will be one more smooth solution, I think it will reduce the following risk if we think everything over. FGL will need more memory as its qps becomes very high. In practice, if the > percentage of used memory is greater than 90%, GC time will become a major problem Absolutely yes, my concern is what R/W ratio and how many STW cost when the benchmark reaches `108K QPS` which is one remarkable and surprising result if R/W ratio is close to production IMO. Thanks again and good luck! Best Regards, - He Xiaoqiao On Wed, Mar 6, 2024 at 8:39 PM Takanobu Asanuma <tasan...@apache.org> wrote: > Thank you for sharing the information. My colleague mentioned that Tencent > Kona 11 might have various improvements and we're interested to see what > it's like. We would also like to try out shenandoah gc. > > - Takanobu > > 2024年3月6日(水) 15:17 Yuanbo Liu <liuyuanb...@gmail.com>: > > > I've heard zgc is better in jdk17 or above, so I think the major problem > is > > that we have to upgrade hadoop code to fit in jdk17. > > We were using jdk11 with zgc to test NN, and didn't see an impressive > > improvement. > > > > On Wed, Mar 6, 2024 at 11:53 AM Takanobu Asanuma <tasan...@apache.org> > > wrote: > > > > > > We're trying tuning gc options and even new gc engine like zgc, but > > they > > > are not very helpful. > > > > > > I'm afraid this is a digression, but could you elaborate on using ZGC > for > > > NameNode? Did you encounter any problems? > > > I've never heard of using ZGC for NameNode in practice, so I'm curious > > > about it. > > > > > > Regards, > > > - Takanobu > > > > > > > > > 2024年3月6日(水) 12:35 Yuanbo Liu <liuyuanb...@gmail.com>: > > > > > > > > a. Snapshot, Symbolic link and reserved feature are not mentioned > at > > > the > > > > design doc, should it be considered > > > > Yes, I agree. Since the snapshot、symbol link is not popular in > HADOOP, > > we > > > > can try to use global lock(write lock of root inode?). In our > > production > > > > env, we just ignore those features, but in the open source community, > > > these > > > > should be considered carefully. > > > > > > > > > b. For the benchmark result, what Read/Write request ratio? And do > > you > > > > meet any GC issues when reaching > > > > FGL will need more memory as its qps becomes very high. In practice, > if > > > the > > > > percentage of used memory is greater than 90%, GC time will become a > > > major > > > > problem. We're trying tuning gc options and even new gc engine like > > zgc, > > > > but they are not very helpful. > > > > > > > > > > > > > > > > On Wed, Mar 6, 2024 at 10:51 AM Hui Fei <feihui.u...@gmail.com> > wrote: > > > > > > > > > Thanks for suggestions. > > > > > > > > > > Actually Started working on this improvement. And cut the > development > > > > > branch :) > > > > > From the proposal doc and the current reviewing work, seems that it > > > > > doesn't touch the existing logic codes too much. It keeps the > > original > > > > > logic there. > > > > > > > > > > @Yuanbo @Zengqiang XU <zande...@apache.org> Could you share any > > > > internal > > > > > improvement info Xiaoqiao mentioned above? > > > > > > > > > > Xiaoqiao He <hexiaoq...@apache.org> 于2024年2月26日周一 19:50写道: > > > > > > > > > >> Thanks for this meaningful proposal. Some nit comments: > > > > >> a. Snapshot, Symbolic link and reserved feature are not mentioned > at > > > the > > > > >> design doc, should it be considered > > > > >> or different to this core design? > > > > >> b. For the benchmark result, what Read/Write request ratio? And do > > you > > > > >> meet > > > > >> any GC issues when reaching > > > > >> `108K QPS`? If true, would you mind sharing STW time cost? > > > > >> c. Is this deployed in your internal cluster now? If true, any > > > > >> performance > > > > >> benefit differences compare to the > > > > >> benchmark? > > > > >> d. This is one huge feature IMO, If discussion passes, suggest > > > creating > > > > a > > > > >> single branch to develop and follow-up > > > > >> works. > > > > >> > > > > >> Thanks again for this meaningful proposal. > > > > >> > > > > >> Best Regards, > > > > >> - He Xiaoqiao > > > > >> > > > > >> > > > > >> On Tue, Feb 20, 2024 at 5:38 PM Yuanbo Liu <liuyuanb...@gmail.com > > > > > > wrote: > > > > >> > > > > >> > Nice to see this feature brought up. We've implemented this > > feature > > > > >> > internally and gained significant performance improvement. I'll > be > > > > glad > > > > >> to > > > > >> > work on some jiras if necessary. > > > > >> > > > > > >> > > > > > >> > On Tue, Feb 20, 2024 at 4:41 PM ZanderXu <zande...@apache.org> > > > wrote: > > > > >> > > > > > >> > > Thank you everyone for reviewing this ticket. > > > > >> > > > > > > >> > > I think if there are no problems with the goal and the overall > > > > >> solution, > > > > >> > we > > > > >> > > are ready to push this ticket forward and I will create some > > > > detailed > > > > >> > > sub-tasks for this ticket. > > > > >> > > > > > > >> > > I will split this project into three milestones to make this > > > project > > > > >> > > cleaner for review and merge. > > > > >> > > Milestone 1: Replacing the current global lock with two locks, > > > > global > > > > >> FS > > > > >> > > lock and global BM lock. End-user can choose which locking > mode > > to > > > > use > > > > >> > > through configuration. > > > > >> > > Milestone 2: Replacing the global FS write lock with directory > > > > >> tree-based > > > > >> > > fine-grained lock. > > > > >> > > Milestone 3: Replacing the global BM lock with directory > > > tree-based > > > > >> > > fine-grained lock. > > > > >> > > > > > > >> > > Each milestone can be merged into the trunk branch in time, > > which > > > > has > > > > >> > > multiple benefits: > > > > >> > > 1. Phased performance improvements can be quickly used by > > everyone > > > > >> > > 2. All developers can better understand the implementation > ideas > > > of > > > > >> the > > > > >> > > fine-grained locking mechanism as soon as possible > > > > >> > > 3. Each milestone is developed based on the latest trunk > branch > > to > > > > >> reduce > > > > >> > > conflicts > > > > >> > > > > > > >> > > If you have any concerns, please feel free to discuss them > > > together. > > > > >> > > I hope you can join us to push this project forward together, > > > > thanks. > > > > >> > > > > > > >> > > > > > > >> > > On Mon, 5 Feb 2024 at 11:33, haiyang hu < > haiyang87...@gmail.com > > > > > > > >> wrote: > > > > >> > > > > > > >> > > > Thank you for raising the issue of this long-standing > > > bottleneck, > > > > >> this > > > > >> > > > will be a very important improvement! > > > > >> > > > > > > > >> > > > Hopefully can participate and push forward together. > > > > >> > > > > > > > >> > > > Best Regards~ > > > > >> > > > > > > > >> > > > Brahma Reddy Battula <bra...@apache.org> 于2024年2月3日周六 > > 00:40写道: > > > > >> > > > > > > > >> > > >> Thanks for bringing this and considering all the history > > around > > > > >> this. > > > > >> > > >> One of the outstanding bottleneck(global lock) from a long > > > time. > > > > >> > > >> > > > > >> > > >> Hopefully we can push forward this time. > > > > >> > > >> > > > > >> > > >> > > > > >> > > >> On Fri, Feb 2, 2024 at 12:23 PM Hui Fei < > > feihui.u...@gmail.com > > > > > > > > >> > wrote: > > > > >> > > >> > > > > >> > > >> > Thanks for driving this. It's very meaningful. The > > > performance > > > > >> > > >> improvement > > > > >> > > >> > looks very good. > > > > >> > > >> > > > > > >> > > >> > Many users are facing the write performance issue. As far > > as > > > I > > > > >> know, > > > > >> > > >> some > > > > >> > > >> > companies already implemented the similar idea on their > > > > internal > > > > >> > > >> branches. > > > > >> > > >> > But the internal branch is very different from the > > community > > > > >> one. So > > > > >> > > >> it's > > > > >> > > >> > very hard to be in sync with the community. If this > > > improvement > > > > >> can > > > > >> > be > > > > >> > > >> > involved in the community, that would be great to both > > > end-user > > > > >> and > > > > >> > > the > > > > >> > > >> > community. > > > > >> > > >> > > > > > >> > > >> > It is very worth doing. > > > > >> > > >> > > > > > >> > > >> > Zengqiang XU <zande...@apache.org> 于2024年2月2日周五 11:07写道: > > > > >> > > >> > > > > > >> > > >> > > Hi everyone > > > > >> > > >> > > > > > > >> > > >> > > I have started a discussion about NameNode Fine-grained > > > > >> Locking to > > > > >> > > >> > improve > > > > >> > > >> > > performance of write operations in NameNode. > > > > >> > > >> > > > > > > >> > > >> > > I started this discussion again for serval main > reasons: > > > > >> > > >> > > 1. We have implemented it and gained nearly 7x > > performance > > > > >> > > >> improvement in > > > > >> > > >> > > our prod environment > > > > >> > > >> > > 2. Many other companies made similar improvements based > > on > > > > >> their > > > > >> > > >> internal > > > > >> > > >> > > branch. > > > > >> > > >> > > 3. This topic has been discussed for a long time, but > > still > > > > >> > without > > > > >> > > >> any > > > > >> > > >> > > results. > > > > >> > > >> > > > > > > >> > > >> > > I hope we can push this important improvement in the > > > > community > > > > >> so > > > > >> > > that > > > > >> > > >> > all > > > > >> > > >> > > end-users can enjoy this significant improvement. > > > > >> > > >> > > > > > > >> > > >> > > I'd really appreciate you can join in and work with me > to > > > > push > > > > >> > this > > > > >> > > >> > feature > > > > >> > > >> > > forward. > > > > >> > > >> > > > > > > >> > > >> > > Thanks very much. > > > > >> > > >> > > > > > > >> > > >> > > Ticket: HDFS-17366 < > > > > >> > > https://issues.apache.org/jira/browse/HDFS-17366> > > > > >> > > >> > > Design: NameNode Fine-grained locking based on > directory > > > tree > > > > >> > > >> > > < > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://docs.google.com/document/d/1bVBQcI4jfzS0UrczB7UhsrQTXmrERGvBV-a9W3HCCjk/edit?usp=sharing > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > >