Thanks Yuanbo for your response.

Since the snapshot、symbol link is not popular in HADOOP

Actually, Snapshot and Symbol are both enabled by many companies. I don't
have information if the 'reserved'
feature is also used.

we can try to use global lock(write lock of root inode?)

I think there are two sides to consider: a. When and How are we aware the
request path includes Snapshot/
Symbol (suppose only directory tree request, block request will be more
difficult.) or not, b. No matter whether
to lock root or lock INode which with Snapshot/Symbol features will involve
some pros and cons.
Not thinking carefully, maybe there will be one more smooth solution, I
think it will reduce the following risk if
we think everything over.

FGL will need more memory as its qps becomes very high. In practice, if the
> percentage of used memory is

greater than 90%, GC time will become a major problem

Absolutely yes, my concern is what R/W ratio and how many STW cost when the
benchmark reaches `108K QPS`
which is one remarkable and surprising result if R/W ratio is close to
production IMO.

Thanks again and good luck!

Best Regards,
- He Xiaoqiao

On Wed, Mar 6, 2024 at 8:39 PM Takanobu Asanuma <tasan...@apache.org> wrote:

> Thank you for sharing the information. My colleague mentioned that Tencent
> Kona 11 might have various improvements and we're interested to see what
> it's like. We would also like to try out shenandoah gc.
>
> - Takanobu
>
> 2024年3月6日(水) 15:17 Yuanbo Liu <liuyuanb...@gmail.com>:
>
> > I've heard zgc is better in jdk17 or above, so I think the major problem
> is
> > that we have to upgrade hadoop code to fit in jdk17.
> > We were using jdk11 with zgc to test NN, and didn't see an impressive
> > improvement.
> >
> > On Wed, Mar 6, 2024 at 11:53 AM Takanobu Asanuma <tasan...@apache.org>
> > wrote:
> >
> > > > We're trying tuning gc options and even new gc engine like zgc, but
> > they
> > > are not very helpful.
> > >
> > > I'm afraid this is a digression, but could you elaborate on using ZGC
> for
> > > NameNode? Did you encounter any problems?
> > > I've never heard of using ZGC for NameNode in practice, so I'm curious
> > > about it.
> > >
> > > Regards,
> > > - Takanobu
> > >
> > >
> > > 2024年3月6日(水) 12:35 Yuanbo Liu <liuyuanb...@gmail.com>:
> > >
> > > > > a. Snapshot, Symbolic link and reserved feature are not mentioned
> at
> > > the
> > > > design doc, should it be considered
> > > > Yes, I agree. Since the snapshot、symbol link is not popular in
> HADOOP,
> > we
> > > > can try to use global lock(write lock of root inode?). In our
> > production
> > > > env, we just ignore those features, but in the open source community,
> > > these
> > > > should be considered carefully.
> > > >
> > > > > b. For the benchmark result, what Read/Write request ratio? And do
> > you
> > > > meet any GC issues when reaching
> > > > FGL will need more memory as its qps becomes very high. In practice,
> if
> > > the
> > > > percentage of used memory is greater than 90%, GC time will become a
> > > major
> > > > problem. We're trying tuning gc options and even new gc engine like
> > zgc,
> > > > but they are not very helpful.
> > > >
> > > >
> > > >
> > > > On Wed, Mar 6, 2024 at 10:51 AM Hui Fei <feihui.u...@gmail.com>
> wrote:
> > > >
> > > > > Thanks for suggestions.
> > > > >
> > > > > Actually Started working on this improvement. And cut the
> development
> > > > > branch :)
> > > > > From the proposal doc and the current reviewing work, seems that it
> > > > > doesn't touch the existing logic codes too much. It keeps the
> > original
> > > > > logic there.
> > > > >
> > > > > @Yuanbo @Zengqiang XU <zande...@apache.org>  Could you share any
> > > > internal
> > > > > improvement info Xiaoqiao mentioned above?
> > > > >
> > > > > Xiaoqiao He <hexiaoq...@apache.org> 于2024年2月26日周一 19:50写道:
> > > > >
> > > > >> Thanks for this meaningful proposal. Some nit comments:
> > > > >> a. Snapshot, Symbolic link and reserved feature are not mentioned
> at
> > > the
> > > > >> design doc, should it be considered
> > > > >> or different to this core design?
> > > > >> b. For the benchmark result, what Read/Write request ratio? And do
> > you
> > > > >> meet
> > > > >> any GC issues when reaching
> > > > >> `108K QPS`? If true, would you mind sharing STW time cost?
> > > > >> c. Is this deployed in your internal cluster now? If true,  any
> > > > >> performance
> > > > >> benefit differences compare to the
> > > > >> benchmark?
> > > > >> d. This is one huge feature IMO, If discussion passes, suggest
> > > creating
> > > > a
> > > > >> single branch to develop and follow-up
> > > > >> works.
> > > > >>
> > > > >> Thanks again for this meaningful proposal.
> > > > >>
> > > > >> Best Regards,
> > > > >> - He Xiaoqiao
> > > > >>
> > > > >>
> > > > >> On Tue, Feb 20, 2024 at 5:38 PM Yuanbo Liu <liuyuanb...@gmail.com
> >
> > > > wrote:
> > > > >>
> > > > >> > Nice to see this feature brought up. We've implemented this
> > feature
> > > > >> > internally and gained significant performance improvement. I'll
> be
> > > > glad
> > > > >> to
> > > > >> > work on some jiras if necessary.
> > > > >> >
> > > > >> >
> > > > >> > On Tue, Feb 20, 2024 at 4:41 PM ZanderXu <zande...@apache.org>
> > > wrote:
> > > > >> >
> > > > >> > > Thank you everyone for reviewing this ticket.
> > > > >> > >
> > > > >> > > I think if there are no problems with the goal and the overall
> > > > >> solution,
> > > > >> > we
> > > > >> > > are ready to push this ticket forward and I will create some
> > > > detailed
> > > > >> > > sub-tasks for this ticket.
> > > > >> > >
> > > > >> > > I will split this project into three milestones to make this
> > > project
> > > > >> > > cleaner for review and merge.
> > > > >> > > Milestone 1: Replacing the current global lock with two locks,
> > > > global
> > > > >> FS
> > > > >> > > lock and global BM lock. End-user can choose which locking
> mode
> > to
> > > > use
> > > > >> > > through configuration.
> > > > >> > > Milestone 2: Replacing the global FS write lock with directory
> > > > >> tree-based
> > > > >> > > fine-grained lock.
> > > > >> > > Milestone 3: Replacing the global BM lock with directory
> > > tree-based
> > > > >> > > fine-grained lock.
> > > > >> > >
> > > > >> > > Each milestone can be merged into the trunk branch in time,
> > which
> > > > has
> > > > >> > > multiple benefits:
> > > > >> > > 1. Phased performance improvements can be quickly used by
> > everyone
> > > > >> > > 2. All developers can better understand the implementation
> ideas
> > > of
> > > > >> the
> > > > >> > > fine-grained locking mechanism as soon as possible
> > > > >> > > 3. Each milestone is developed based on the latest trunk
> branch
> > to
> > > > >> reduce
> > > > >> > > conflicts
> > > > >> > >
> > > > >> > > If you have any concerns, please feel free to discuss them
> > > together.
> > > > >> > > I hope you can join us to push this project forward together,
> > > > thanks.
> > > > >> > >
> > > > >> > >
> > > > >> > > On Mon, 5 Feb 2024 at 11:33, haiyang hu <
> haiyang87...@gmail.com
> > >
> > > > >> wrote:
> > > > >> > >
> > > > >> > > > Thank you for raising the issue of this long-standing
> > > bottleneck,
> > > > >> this
> > > > >> > > > will be a very important improvement!
> > > > >> > > >
> > > > >> > > > Hopefully can participate and push forward together.
> > > > >> > > >
> > > > >> > > > Best Regards~
> > > > >> > > >
> > > > >> > > > Brahma Reddy Battula <bra...@apache.org> 于2024年2月3日周六
> > 00:40写道:
> > > > >> > > >
> > > > >> > > >> Thanks for bringing this and considering all the history
> > around
> > > > >> this.
> > > > >> > > >> One of the outstanding bottleneck(global lock) from a long
> > > time.
> > > > >> > > >>
> > > > >> > > >> Hopefully we can push forward this time.
> > > > >> > > >>
> > > > >> > > >>
> > > > >> > > >> On Fri, Feb 2, 2024 at 12:23 PM Hui Fei <
> > feihui.u...@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > > >>
> > > > >> > > >> > Thanks for driving this. It's very meaningful. The
> > > performance
> > > > >> > > >> improvement
> > > > >> > > >> > looks very good.
> > > > >> > > >> >
> > > > >> > > >> > Many users are facing the write performance issue. As far
> > as
> > > I
> > > > >> know,
> > > > >> > > >> some
> > > > >> > > >> > companies already implemented the similar idea on their
> > > > internal
> > > > >> > > >> branches.
> > > > >> > > >> > But the internal branch is very different from the
> > community
> > > > >> one. So
> > > > >> > > >> it's
> > > > >> > > >> > very hard to be in sync with the community. If this
> > > improvement
> > > > >> can
> > > > >> > be
> > > > >> > > >> > involved in the community, that would be great to both
> > > end-user
> > > > >> and
> > > > >> > > the
> > > > >> > > >> > community.
> > > > >> > > >> >
> > > > >> > > >> > It is very worth doing.
> > > > >> > > >> >
> > > > >> > > >> > Zengqiang XU <zande...@apache.org> 于2024年2月2日周五 11:07写道:
> > > > >> > > >> >
> > > > >> > > >> > > Hi everyone
> > > > >> > > >> > >
> > > > >> > > >> > > I have started a discussion about NameNode Fine-grained
> > > > >> Locking to
> > > > >> > > >> > improve
> > > > >> > > >> > > performance of write operations in NameNode.
> > > > >> > > >> > >
> > > > >> > > >> > > I started this discussion again for serval main
> reasons:
> > > > >> > > >> > > 1. We have implemented it and gained nearly 7x
> > performance
> > > > >> > > >> improvement in
> > > > >> > > >> > > our prod environment
> > > > >> > > >> > > 2. Many other companies made similar improvements based
> > on
> > > > >> their
> > > > >> > > >> internal
> > > > >> > > >> > > branch.
> > > > >> > > >> > > 3. This topic has been discussed for a long time, but
> > still
> > > > >> > without
> > > > >> > > >> any
> > > > >> > > >> > > results.
> > > > >> > > >> > >
> > > > >> > > >> > > I hope we can push this important improvement in the
> > > > community
> > > > >> so
> > > > >> > > that
> > > > >> > > >> > all
> > > > >> > > >> > > end-users can enjoy this significant improvement.
> > > > >> > > >> > >
> > > > >> > > >> > > I'd really appreciate you can join in and work with me
> to
> > > > push
> > > > >> > this
> > > > >> > > >> > feature
> > > > >> > > >> > > forward.
> > > > >> > > >> > >
> > > > >> > > >> > > Thanks very much.
> > > > >> > > >> > >
> > > > >> > > >> > > Ticket: HDFS-17366 <
> > > > >> > > https://issues.apache.org/jira/browse/HDFS-17366>
> > > > >> > > >> > > Design: NameNode Fine-grained locking based on
> directory
> > > tree
> > > > >> > > >> > > <
> > > > >> > > >> > >
> > > > >> > > >> >
> > > > >> > > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1bVBQcI4jfzS0UrczB7UhsrQTXmrERGvBV-a9W3HCCjk/edit?usp=sharing
> > > > >> > > >> > > >
> > > > >> > > >> > >
> > > > >> > > >> >
> > > > >> > > >>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to