[
https://issues.apache.org/jira/browse/PHOENIX-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130798#comment-16130798
]
Hadoop QA commented on PHOENIX-4094:
------------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12882385/PHOENIX-4094_4.x-HBase-0.98_v1.patch
against 4.x-HBase-0.98 branch at commit
b13413614fef3cdb87233fd1543081e7198d685f.
ATTACHMENT ID: 12882385
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.
{color:red}-1 patch{color}. The patch command could not apply the patch.
Console output:
https://builds.apache.org/job/PreCommit-PHOENIX-Build/1271//console
This message is automatically generated.
> ParallelWriterIndexCommitter incorrectly applys local updates to index tables
> for 4.x-HBase-0.98
> ------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-4094
> URL: https://issues.apache.org/jira/browse/PHOENIX-4094
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.11.0
> Reporter: chenglei
> Assignee: chenglei
> Fix For: 4.12.0
>
> Attachments: PHOENIX-4094_4.x-HBase-0.98_v1.patch
>
>
> I used phoenix-4.x-HBase-0.98 in my hbase cluster.When I restarted my hbase
> cluster a certain time, I noticed some RegionServers have plenty of
> {{WrongRegionException}} as following:
> {code:java}
> 2017-08-01 11:53:10,669 WARN
> [rsync.slave005.bizhbasetest.sjs.ted,60020,1501511894174-index-writer--pool2-t786]
> regionserver.HRegion: Failed getting lock in batch put,
> row=\x10\x00\x00\x00913f0eed-6710-4de9-8bac-077a106bb9ae_0
> org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out
> of range for row lock on HRegion
> BIZARCH_NS_PRODUCT.BIZTRACER_SPAN,90ffd783-b0a3-4f8a-81ef-0a7535fea197_0,1490066612493.463220cd8fad7254481595911e62d74d.,
> startKey='90ffd783-b0a3-4f8a-81ef-0a7535fea197_0',
> getEndKey()='917fc343-3331-47fa-907c-df83a6f302f7_0',
> row='\x10\x00\x00\x00913f0eed-6710-4de9-8bac-077a106bb9ae_0'
> at
> org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:3539)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:3557)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2394)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2261)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2213)
> at
> org.apache.phoenix.util.IndexUtil.writeLocalUpdates(IndexUtil.java:671)
> at
> org.apache.phoenix.hbase.index.write.ParallelWriterIndexCommitter$1.call(ParallelWriterIndexCommitter.java:157)
> at
> org.apache.phoenix.hbase.index.write.ParallelWriterIndexCommitter$1.call(ParallelWriterIndexCommitter.java:134)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The problem is caused by the ParallelWriterIndexCommitter.write method, in
> following line 151, if {{allowLocalUpdates}} is true, it would wiite index
> mutations to current data table region unconditionlly,which is obviously
> inappropriate:
> {code:java}
> 150 try {
> 151 if (allowLocalUpdates && env != null) {
> 152 try {
> 153 throwFailureIfDone();
> 154
> IndexUtil.writeLocalUpdates(env.getRegion(), mutations, true);
> 155 return null;
> 156 } catch (IOException ignord) {
> 157 // when it's failed we fall back to the
> standard & slow way
> 158 if (LOG.isDebugEnabled()) {
> 159 LOG.debug("indexRegion.batchMutate
> failed and fall back to HTable.batch(). Got error="
> 160 + ignord);
> 161 }
> 162 }
> 163 }
> {code}
> If a data table has a global index table , and when we replay the WALs to
> index table in Indexer.postOpen method in following
> line 691, which the {{allowLocalUpdates}} parameter is true, the {{updates}}
> parameter for the global index table would incorrectly be written to the
> current data table region:
> {code:java}
> 688 // do the usual writer stuff, killing the server again, if we
> can't manage to make the index
> 689 // writes succeed again
> 690 try {
> 691 writer.writeAndKillYourselfOnFailure(updates, true);
> 692 } catch (IOException e) {
> 693 LOG.error("During WAL replay of outstanding index updates,
> "
> 694 + "Exception is thrown instead of killing server
> during index writing", e);
> 695 }
> 696 } finally {
> {code}
> However, ParallelWriterIndexCommitter.write method in the master and other
> 4.x branches is correct, just as following line 150 and line 151 :
> {code:java}
> 147 try {
> 148 if (allowLocalUpdates
> 149 && env != null
> 150 && tableReference.getTableName().equals(
> 151
> env.getRegion().getTableDesc().getNameAsString())) {
> 152 try {
> 153 throwFailureIfDone();
> 154
> IndexUtil.writeLocalUpdates(env.getRegion(), mutations, true);
> 155 return null;
> 156 } catch (IOException ignord) {
> 157 // when it's failed we fall back to the
> standard & slow way
> 158 if (LOG.isDebugEnabled()) {
> 159 LOG.debug("indexRegion.batchMutate
> failed and fall back to HTable.batch(). Got error="
> 160 + ignord);
> 161 }
> 162 }
> 163 }
> {code}
> This inconsistency of branches is introduced by PHOENIX-1734 and
> PHOENIX-3018, because lack of unit tests or IT tests for
> Indexer.preWALRestore /postOpen, the inconsistency is not detected.
> BTW,the TrackingParallelWriterIndexCommitter is right for master and all the
> 4.x branches.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)