[
https://issues.apache.org/jira/browse/HBASE-14177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Estes updated HBASE-14177:
--------------------------------
Description:
After adding a large row, scanning back that row winds up being empty. After a
few attempts it will succeed (all attempts over the same data on an hbase
getting no other writes).
Looking at logs, it seems this happens when there is memory pressure on the
client and there are several Full GCs that happen. Then messages that indicate
that region locations are being removed from the local client cache:
2015-07-31 12:50:24,647 [main] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation -
Removed 192.168.1.131:50981 as a location of
big_row_1438368609944,,1438368610048.880c849594807bdc7412f4f982337d6c. for
tableName=big_row_1438368609944 from cache
Blaming the GC may sound fanciful, but if the test is run with -Xms4g -Xmx4g
then it always passes on the first scan attempt. Maybe the pause is enough to
remove something from the cache, or the client is using weak references
somewhere?
More info
http://mail-archives.apache.org/mod_mbox/hbase-user/201507.mbox/%3CCAE8tVdnFf%3Dob569%3DfJkpw1ndVWOVTkihYj9eo6qt0FrzihYHgw%40mail.gmail.com%3E
Test used to reproduce:
https://github.com/housejester/hbase-debugging#fullgctest
Repro'ed in:
0.98.12 client/server
0.98.13 client 0.98.12 server
0.98.13 client/server
1.1.0 client 0.98.13 server
0.98.13 client and 1.1.0 server
0.98.12 client and 1.1.0 server
Unable to repro in
1.1.0 client/server
was:
After adding a large row, scanning back that row winds up being empty. After a
few attempts it will succeed (all attempts over the same data on an hbase
getting no other writes).
Looking at logs, it seems this happens when there is memory pressure on the
client and there are several Full GCs that happen. Then messages that indicate
that region locations are being removed from the local client cache:
2015-07-31 12:50:24,647 [main] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation -
Removed 192.168.1.131:50981 as a location of
big_row_1438368609944,,1438368610048.880c849594807bdc7412f4f982337d6c. for
tableName=big_row_1438368609944 from cache
Blaming the GC may sound fanciful, but if the test is run with -Xms4g -Xmx4g
then it always passes on the first scan attempt. Maybe the pause is enough to
remove something from the cache, or the client is using weak references
somewhere?
More info
http://mail-archives.apache.org/mod_mbox/hbase-user/201507.mbox/%3CCAE8tVdnFf%3Dob569%3DfJkpw1ndVWOVTkihYj9eo6qt0FrzihYHgw%40mail.gmail.com%3E
Test used to reproduce:
https://github.com/housejester/hbase-debugging#fullgctest
> Full GC on client may lead to missing scan results
> --------------------------------------------------
>
> Key: HBASE-14177
> URL: https://issues.apache.org/jira/browse/HBASE-14177
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 1.1.0, 0.98.12, 0.98.13
> Reporter: James Estes
> Priority: Critical
> Labels: dataloss
> Fix For: 2.0.0, 1.3.0
>
>
> After adding a large row, scanning back that row winds up being empty. After
> a few attempts it will succeed (all attempts over the same data on an hbase
> getting no other writes).
> Looking at logs, it seems this happens when there is memory pressure on the
> client and there are several Full GCs that happen. Then messages that
> indicate that region locations are being removed from the local client cache:
> 2015-07-31 12:50:24,647 [main] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Removed 192.168.1.131:50981 as a location of
> big_row_1438368609944,,1438368610048.880c849594807bdc7412f4f982337d6c. for
> tableName=big_row_1438368609944 from cache
> Blaming the GC may sound fanciful, but if the test is run with -Xms4g -Xmx4g
> then it always passes on the first scan attempt. Maybe the pause is enough to
> remove something from the cache, or the client is using weak references
> somewhere?
> More info
> http://mail-archives.apache.org/mod_mbox/hbase-user/201507.mbox/%3CCAE8tVdnFf%3Dob569%3DfJkpw1ndVWOVTkihYj9eo6qt0FrzihYHgw%40mail.gmail.com%3E
> Test used to reproduce:
> https://github.com/housejester/hbase-debugging#fullgctest
> Repro'ed in:
> 0.98.12 client/server
> 0.98.13 client 0.98.12 server
> 0.98.13 client/server
> 1.1.0 client 0.98.13 server
> 0.98.13 client and 1.1.0 server
> 0.98.12 client and 1.1.0 server
> Unable to repro in
> 1.1.0 client/server
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)