Agree.
Then I will modify HBASE-26849 according to this plan, and if it goes well,
it will be submitted to branch-1 first next week.
Also, if Andrew doesn't mind, I'd be more than happy to be involved in the
follow-up renovation. Maybe we can discuss the details later.

Thank you.

张铎(Duo Zhang) <palomino...@gmail.com> 于2022年3月18日周五 12:47写道:

> So I think the first step is to introduce different implementations for
> different scenarios.
>
> And for replication, we could use your above approach to solve the problem
> first, and then we could try to use Andrew's approach to optimize.
>
> WDYT?
>
> 唐天航 <tangtianhang...@gmail.com> 于2022年3月18日周五 11:50写道:
>
> > If we can accept the performance degradation in the Replication process,
> > then I have a solution that can completely solve this problem based on
> the
> > existing implementation.
> > See ReaderBase#seek:
> >
> > @Override
> > public void seek(long pos) throws IOException {
> >   if (compressionContext != null && emptyCompressionContext) {
> >     while (next() != null) {
> >       if (getPosition() == pos) {
> >         emptyCompressionContext = false;
> >         break;
> >       }
> >     }
> >   }
> >   seekOnFs(pos);
> > }
> >
> > The logic of this code is actually to build the LRUCache, but
> > emptyCompressionContext is only true at initialization.
> > We can modify it to ensure that the hfile is re-read and the LRUCache is
> > rebuilded every time when we seek.
> > Then replace all external calls to seekOnFs with this method.
> >
> > The consequence of doing this is a drop in Replication performance, but I
> > think at least we can guarantee the correctness of LRUCache.
> >
> > Do you think this plan makes sense?
> >
> > (The code above is based on branch-1, but the logic on master is similar)
> >
> > 张铎(Duo Zhang) <palomino...@gmail.com> 于2022年3月17日周四 23:07写道:
> >
> > > I agree with Andrew that without a 'versioned' LRUCache, it is not easy
> > to
> > > implement things correctly. And yes it will impact performance if we
> > > implement the 'versioned' logic, for example, using a buffer.
> > >
> > > But considering the real scenario, we do not always need to support
> > > rollback LRUCache.
> > > When writing WAL, which is on the critical write path, we do not need
> to
> > > rollback the LRUCache.
> > > And on reading WAL, basically we have two scenarios.
> > > The first is splitting WAL, which will impact MTTR. But we can make
> sure
> > > that we will only read closed WAL files when recovery, and even if we
> > fail
> > > in the middle, we can just fail the task, the master will schedule a
> new
> > > splitting task and try again. So we do not need to implement rollback
> for
> > > this scenario too.
> > > The second is replication. I think this is the only place where we need
> > to
> > > implement rollback, as we will keep tailing the WAL file which is being
> > > written currently. And for replication, the performance for reading WAL
> > > is not very critical then, so I think it is OK to implement rollback
> for
> > > this scenario.
> > >
> > > So basically, I think we could have different LRUCache implementations,
> > and
> > > also different reader implementations, to suit different scenarios,
> then
> > we
> > > could gain both correctness and performance.
> > >
> > > Thanks.
> > >
> > > 唐天航 <tangtianhang...@gmail.com> 于2022年3月17日周四 17:35写道:
> > >
> > > > Hi duo,
> > > >
> > > > I have submit a PR <https://github.com/apache/hbase/pull/4237> for
> the
> > > > doc.
> > > > Please kindly help me review it if it is convenient for you. Maybe
> need
> > > > some polish.
> > > >
> > > > Thank you, Regards
> > > >
> > > >
> > > > 张铎(Duo Zhang) <palomino...@gmail.com> 于2022年3月16日周三 18:36写道:
> > > >
> > > > > +1 on updating doc first. You can file an issue for the
> documentation
> > > > > change, and let's also send an NOTICE email to both dev and user
> list
> > > to
> > > > > warn our users about this.
> > > > >
> > > > > 唐天航 <tangtianhang...@gmail.com> 于2022年3月16日周三 18:08写道:
> > > > >
> > > > > > If we only reset the position to the head, yes we can fix it.
> > > > > > In fact, 26849 is to fix the problem in this scenario.
> > > > > > But unfortunately, we have some other scenarios where we roll
> back
> > > the
> > > > > > position to some intermediate position, such as
> > > > > ProtobufLogReader.java#L381
> > > > > > <
> > > > >
> > >
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java
> > > > > >
> > /org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java#L381
> > > > > > <
> > > > >
> > > >
> > >
> >
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java#L381
> > > > > >
> > > > > > >
> > > > > > I think we cannot rollback the LRUCache too...
> > > > > > While my cluster works fine after 26849, the fix is still
> > > theoretically
> > > > > > incomplete.
> > > > > >
> > > > > > 张铎(Duo Zhang) <palomino...@gmail.com> 于2022年3月16日周三 17:59写道:
> > > > > >
> > > > > > > The old WAL compression implementation is buggy when used
> > together
> > > > with
> > > > > > > replication, that's true...
> > > > > > >
> > > > > > > But in general I think it is fixable, the dict is per file
> IIRC,
> > > so I
> > > > > > think
> > > > > > > clearing the LRUCache when resetting to the head of the file
> can
> > > fix
> > > > > the
> > > > > > > problem?
> > > > > > >
> > > > > > > Maybe we need to do some refactoring...
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > 唐天航 <tangtianhang...@gmail.com> 于2022年3月16日周三 16:20写道:
> > > > > > >
> > > > > > > > Hi masters,
> > > > > > > >
> > > > > > > > I have created an issue HBASE-26849
> > > > > > > > <https://issues.apache.org/jira/browse/HBASE-26849> about
> NPE
> > > > caused
> > > > > > by
> > > > > > > > WAL
> > > > > > > > Compression and Replication.
> > > > > > > >
> > > > > > > > For this problem, I try to reopen a WAL reader when we reset
> > the
> > > > > > position
> > > > > > > > to 0 and it looks like it's working well. But it didn't
> > > > fundamentally
> > > > > > > solve
> > > > > > > > the problem.
> > > > > > > >
> > > > > > > > Since we have the WAL Compression feature, Replication has
> > > > > introduced a
> > > > > > > lot
> > > > > > > > of new code, and there are many places that reset the HLog
> > > > position,
> > > > > > such
> > > > > > > > as seekOnFs to originalPosition. I guess none of these codes
> > > > consider
> > > > > > > > compatibility with WAL Compression. Because theoretically we
> > can
> > > > roll
> > > > > > > back
> > > > > > > > the position to any position at any time, but the LRUCache in
> > the
> > > > > > > > corresponding LRUDictionary should also be rolled back,
> > otherwise
> > > > the
> > > > > > > read
> > > > > > > > and write link behavior may be inconsistent. But LRUCache
> can't
> > > > roll
> > > > > > back
> > > > > > > > at all...
> > > > > > > >
> > > > > > > > So my thought is, open another issue and add some description
> > in
> > > > the
> > > > > > doc,
> > > > > > > > WAL Compression and Replication are not compatible.
> > > > > > > >
> > > > > > > > What do you think?
> > > > > > > >
> > > > > > > > Thank you. Regards
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to