Agree. Then I will modify HBASE-26849 according to this plan, and if it goes well, it will be submitted to branch-1 first next week. Also, if Andrew doesn't mind, I'd be more than happy to be involved in the follow-up renovation. Maybe we can discuss the details later.
Thank you. 张铎(Duo Zhang) <palomino...@gmail.com> 于2022年3月18日周五 12:47写道: > So I think the first step is to introduce different implementations for > different scenarios. > > And for replication, we could use your above approach to solve the problem > first, and then we could try to use Andrew's approach to optimize. > > WDYT? > > 唐天航 <tangtianhang...@gmail.com> 于2022年3月18日周五 11:50写道: > > > If we can accept the performance degradation in the Replication process, > > then I have a solution that can completely solve this problem based on > the > > existing implementation. > > See ReaderBase#seek: > > > > @Override > > public void seek(long pos) throws IOException { > > if (compressionContext != null && emptyCompressionContext) { > > while (next() != null) { > > if (getPosition() == pos) { > > emptyCompressionContext = false; > > break; > > } > > } > > } > > seekOnFs(pos); > > } > > > > The logic of this code is actually to build the LRUCache, but > > emptyCompressionContext is only true at initialization. > > We can modify it to ensure that the hfile is re-read and the LRUCache is > > rebuilded every time when we seek. > > Then replace all external calls to seekOnFs with this method. > > > > The consequence of doing this is a drop in Replication performance, but I > > think at least we can guarantee the correctness of LRUCache. > > > > Do you think this plan makes sense? > > > > (The code above is based on branch-1, but the logic on master is similar) > > > > 张铎(Duo Zhang) <palomino...@gmail.com> 于2022年3月17日周四 23:07写道: > > > > > I agree with Andrew that without a 'versioned' LRUCache, it is not easy > > to > > > implement things correctly. And yes it will impact performance if we > > > implement the 'versioned' logic, for example, using a buffer. > > > > > > But considering the real scenario, we do not always need to support > > > rollback LRUCache. > > > When writing WAL, which is on the critical write path, we do not need > to > > > rollback the LRUCache. > > > And on reading WAL, basically we have two scenarios. > > > The first is splitting WAL, which will impact MTTR. But we can make > sure > > > that we will only read closed WAL files when recovery, and even if we > > fail > > > in the middle, we can just fail the task, the master will schedule a > new > > > splitting task and try again. So we do not need to implement rollback > for > > > this scenario too. > > > The second is replication. I think this is the only place where we need > > to > > > implement rollback, as we will keep tailing the WAL file which is being > > > written currently. And for replication, the performance for reading WAL > > > is not very critical then, so I think it is OK to implement rollback > for > > > this scenario. > > > > > > So basically, I think we could have different LRUCache implementations, > > and > > > also different reader implementations, to suit different scenarios, > then > > we > > > could gain both correctness and performance. > > > > > > Thanks. > > > > > > 唐天航 <tangtianhang...@gmail.com> 于2022年3月17日周四 17:35写道: > > > > > > > Hi duo, > > > > > > > > I have submit a PR <https://github.com/apache/hbase/pull/4237> for > the > > > > doc. > > > > Please kindly help me review it if it is convenient for you. Maybe > need > > > > some polish. > > > > > > > > Thank you, Regards > > > > > > > > > > > > 张铎(Duo Zhang) <palomino...@gmail.com> 于2022年3月16日周三 18:36写道: > > > > > > > > > +1 on updating doc first. You can file an issue for the > documentation > > > > > change, and let's also send an NOTICE email to both dev and user > list > > > to > > > > > warn our users about this. > > > > > > > > > > 唐天航 <tangtianhang...@gmail.com> 于2022年3月16日周三 18:08写道: > > > > > > > > > > > If we only reset the position to the head, yes we can fix it. > > > > > > In fact, 26849 is to fix the problem in this scenario. > > > > > > But unfortunately, we have some other scenarios where we roll > back > > > the > > > > > > position to some intermediate position, such as > > > > > ProtobufLogReader.java#L381 > > > > > > < > > > > > > > > > https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java > > > > > > > > /org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java#L381 > > > > > > < > > > > > > > > > > > > > > > https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java#L381 > > > > > > > > > > > > > > > > > > > I think we cannot rollback the LRUCache too... > > > > > > While my cluster works fine after 26849, the fix is still > > > theoretically > > > > > > incomplete. > > > > > > > > > > > > 张铎(Duo Zhang) <palomino...@gmail.com> 于2022年3月16日周三 17:59写道: > > > > > > > > > > > > > The old WAL compression implementation is buggy when used > > together > > > > with > > > > > > > replication, that's true... > > > > > > > > > > > > > > But in general I think it is fixable, the dict is per file > IIRC, > > > so I > > > > > > think > > > > > > > clearing the LRUCache when resetting to the head of the file > can > > > fix > > > > > the > > > > > > > problem? > > > > > > > > > > > > > > Maybe we need to do some refactoring... > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > 唐天航 <tangtianhang...@gmail.com> 于2022年3月16日周三 16:20写道: > > > > > > > > > > > > > > > Hi masters, > > > > > > > > > > > > > > > > I have created an issue HBASE-26849 > > > > > > > > <https://issues.apache.org/jira/browse/HBASE-26849> about > NPE > > > > caused > > > > > > by > > > > > > > > WAL > > > > > > > > Compression and Replication. > > > > > > > > > > > > > > > > For this problem, I try to reopen a WAL reader when we reset > > the > > > > > > position > > > > > > > > to 0 and it looks like it's working well. But it didn't > > > > fundamentally > > > > > > > solve > > > > > > > > the problem. > > > > > > > > > > > > > > > > Since we have the WAL Compression feature, Replication has > > > > > introduced a > > > > > > > lot > > > > > > > > of new code, and there are many places that reset the HLog > > > > position, > > > > > > such > > > > > > > > as seekOnFs to originalPosition. I guess none of these codes > > > > consider > > > > > > > > compatibility with WAL Compression. Because theoretically we > > can > > > > roll > > > > > > > back > > > > > > > > the position to any position at any time, but the LRUCache in > > the > > > > > > > > corresponding LRUDictionary should also be rolled back, > > otherwise > > > > the > > > > > > > read > > > > > > > > and write link behavior may be inconsistent. But LRUCache > can't > > > > roll > > > > > > back > > > > > > > > at all... > > > > > > > > > > > > > > > > So my thought is, open another issue and add some description > > in > > > > the > > > > > > doc, > > > > > > > > WAL Compression and Replication are not compatible. > > > > > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > > > Thank you. Regards > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >