[
https://issues.apache.org/jira/browse/HBASE-20649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535476#comment-16535476
]
Sean Busbey commented on HBASE-20649:
-------------------------------------
okay, I think this can work. We just need to add some more info to the section
explaining how to interpret the output. Before we push forward on this, folks
should read through and see if we're asking too much of operators.
On my test cluster I made PREFIX_TREE table, inserted data, flushed it,
snapshot it, cloned the snapshot, then altered both tables to change the dbe to
something other than PREFIX_TREE. Then I started from the assumption of not
knowing that had happened and relying on the pre-upgrade tool to figure out how
to make things work.
Each iteration I ran the same command: {{hbase --config /etc/hbase/conf
pre-upgrade validate-hfile}}
h3. first run
Tool complains about the file in {{example}} table, the first flush. Here's the
output
{code}
18/07/06 15:46:33 WARN hbck.HFileCorruptionChecker: Found corrupt HFile
hdfs://busbey-hbase-20649-1.example.com:8020/hbase/data/default/example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile
Trailer from file
hdfs://busbey-hbase-20649-1.example.com:8020/hbase/data/default/example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea
at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:545)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:611)
at
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkHFile(HFileCorruptionChecker.java:101)
at
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkColFamDir(HFileCorruptionChecker.java:185)
at
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkRegionDir(HFileCorruptionChecker.java:323)
at
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:408)
at
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:399)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Invalid data block encoding type in file info:
PREFIX_TREE
at
org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.createFromFileInfo(HFileDataBlockEncoderImpl.java:58)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.<init>(HFileReaderImpl.java:246)
at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:538)
... 14 more
Caused by: java.lang.IllegalArgumentException: No enum constant
org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.PREFIX_TREE
at java.lang.Enum.valueOf(Enum.java:238)
at
org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.valueOf(DataBlockEncoding.java:31)
at
org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.createFromFileInfo(HFileDataBlockEncoderImpl.java:56)
... 16 more
18/07/06 15:46:33 INFO tool.HFileContentValidator: Validating HFile contents
under hdfs://busbey-hbase-20649-1.example.com:8020/hbase/archive
18/07/06 15:46:33 WARN tool.HFileContentValidator: Corrupt file:
hdfs://busbey-hbase-20649-1.example.com:8020/hbase/data/default/example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea
18/07/06 15:46:33 WARN tool.HFileContentValidator: There are 1 corrupted
HFiles. Change data block encodings before upgrading. Check
https://s.apache.org/prefixtree for instructions.
{code}
I think given the path {{/hbase/data/default/example/}} it's straight forward
to reason "I need to do a major compaction of the example table". So I did that.
h3. second run
The tool complains about the same file, but this time it's in the archive
directory.
{code}
18/07/06 15:50:42 INFO tool.HFileContentValidator: Validating HFile contents
under hdfs://busbey-hbase-20649-1.example.com:8020/hbase/archive
18/07/06 15:50:42 WARN hbck.HFileCorruptionChecker: Found corrupt HFile
hdfs://busbey-hbase-20649-1.example.com:8020/hbase/archive/data/default/example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile
Trailer from file
hdfs://busbey-hbase-20649-1.example.com:8020/hbase/archive/data/default/example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea
at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:545)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:611)
at
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkHFile(HFileCorruptionChecker.java:101)
at
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkColFamDir(HFileCorruptionChecker.java:185)
at
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkRegionDir(HFileCorruptionChecker.java:323)
at
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:408)
at
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:399)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Invalid data block encoding type in file info:
PREFIX_TREE
at
org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.createFromFileInfo(HFileDataBlockEncoderImpl.java:58)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.<init>(HFileReaderImpl.java:246)
at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:538)
... 14 more
Caused by: java.lang.IllegalArgumentException: No enum constant
org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.PREFIX_TREE
at java.lang.Enum.valueOf(Enum.java:238)
at
org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.valueOf(DataBlockEncoding.java:31)
at
org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.createFromFileInfo(HFileDataBlockEncoderImpl.java:56)
... 16 more
18/07/06 15:50:42 WARN tool.HFileContentValidator: Corrupt file:
hdfs://busbey-hbase-20649-1.example.com:8020/hbase/archive/data/default/example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea
18/07/06 15:50:42 WARN tool.HFileContentValidator: There are 1 corrupted
HFiles. Change data block encodings before upgrading. Check
https://s.apache.org/prefixtree for instructions.
{code}
This is where I think we need more details in the ref guide. I just major
compacted the example table, so why is this still in the archive directory? can
I delete it?
Let's start with "check if any tables have an hreference to the file because of
cloning". Given the file base name in the output
({{bfc569db5fa543f5ba69bab594a85cea}}) we can use HDFS' find tool to get it.
{code}
hdfs dfs -find /hbase/data -name '*-bfc569db5fa543f5ba69bab594a85cea'
/hbase/data/default/clone_of_example/24482ef2073ababe2f4587af2eccb6cd/f1/example=624357cffd1fae4422663c98155de45b-bfc569db5fa543f5ba69bab594a85cea
{code}
The above output means the "clone_of_example" table in the default namespace is
referencing the file that got flagged. (if the example table hadn't been
compacted then the file would be within the encoded region name that comes
between the "=" and the "-"). With a little more effort we could make an
example bash invocation to pull out the region/file combos from the tool output
and iterate over doing hdfs finds.
In this case, I did a major compaction of "clone_of_example".
h3. third run
same output as the second run, so it's hard to tell if we've made progress or
not.
"What if the file is referenced from a snapshot?"
We can iterate over every snapshot and look at its files for the one that got
flagged. Like before, this example just takes the one file name instead of
trying to parse the tool output and iterate over them.
{code}
$ for snapshot in $(hbase snapshotinfo -list-snapshots 2>/dev/null | tail -n -1
| cut -f 1 -d \|); do echo "checking snapshot named '${snapshot}'"; hbase
snapshotinfo -snapshot "${snapshot}" -files 2>/dev/null | grep
bfc569db5fa543f5ba69bab594a85cea; done
checking snapshot named 'some_snapshot'
1.1 K
example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea
(archive)
{code}
Found it! From here I can just delete the snapshot. Alternatively I can "clean"
it by cloning it, major compacting the resultant table, making a new snapshot,
and deleting the original. I did the latter.
{code}
hbase(main):001:0> create_namespace 'pre_upgrade_cleanup'
0 row(s) in 0.1090 seconds
hbase(main):002:0> clone_snapshot 'some_snapshot',
'pre_upgrade_cleanup:some_snapshot'
0 row(s) in 0.4880 seconds
hbase(main):003:0> describe 'pre_upgrade_cleanup:some_snapshot'
Table pre_upgrade_cleanup:some_snapshot is ENABLED
pre_upgrade_cleanup:some_snapshot
COLUMN FAMILIES DESCRIPTION
{NAME => 'f1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false',
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'PREFIX_TREE', TTL =>
'FOREVER
', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE
=> '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.0270 seconds
hbase(main):004:0> alter 'pre_upgrade_cleanup:some_snapshot', { NAME => 'f1',
DATA_BLOCK_ENCODING => 'FAST_DIFF' }
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9830 seconds
hbase(main):005:0> major_compact 'pre_upgrade_cleanup:some_snapshot'
0 row(s) in 0.0600 seconds
hbase(main):006:0> delete_snapshot 'some_snapshot'
0 row(s) in 0.0250 seconds
hbase(main):007:0> snapshot 'pre_upgrade_cleanup:some_snapshot', 'some_snapshot'
0 row(s) in 0.3410 seconds
hbase(main):008:0> disable 'pre_upgrade_cleanup:some_snapshot'
0 row(s) in 2.3790 seconds
hbase(main):009:0> drop 'pre_upgrade_cleanup:some_snapshot'
0 row(s) in 1.2540 seconds
hbase(main):010:0> drop_namespace 'pre_upgrade_cleanup'
0 row(s) in 0.0280 seconds
hbase(main):011:0> exit
{code}
h3. run four
Same output as the prior two runs. This time checking for hfile references and
snapshots shows nothing.
{code}
$ hdfs dfs -find /hbase/data -name '*-bfc569db5fa543f5ba69bab594a85cea'
$ for snapshot in $(hbase snapshotinfo -list-snapshots 2>/dev/null | tail -n -1
| cut -f 1 -d \|); do echo "checking snapshot named '${snapshot}'"; hbase
snapshotinfo -snapshot "${snapshot}" -files 2>/dev/null | grep
bfc569db5fa543f5ba69bab594a85cea; done
checking snapshot named 'some_snapshot'
{code}
So I waited until longer than the hfile cleaning period because I forgot what
to look for in the master log to see if it ran.
h3. run five
All clear.
{code}
18/07/06 16:30:39 INFO tool.HFileContentValidator: Validating HFile contents
under hdfs://busbey-hbase-20649-1.example.com:8020/hbase/archive
18/07/06 16:30:39 INFO tool.HFileContentValidator: There are no incompatible
HFiles under hdfs://busbey-hbase-20649-1.example.com:8020/hbase.
{code}
> Validate HFiles do not have PREFIX_TREE DataBlockEncoding
> ---------------------------------------------------------
>
> Key: HBASE-20649
> URL: https://issues.apache.org/jira/browse/HBASE-20649
> Project: HBase
> Issue Type: New Feature
> Reporter: Peter Somogyi
> Assignee: Balazs Meszaros
> Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HBASE-20649.master.001.patch,
> HBASE-20649.master.002.patch, HBASE-20649.master.003.patch,
> HBASE-20649.master.004.patch
>
>
> HBASE-20592 adds a tool to check column families on the cluster do not have
> PREFIX_TREE encoding.
> Since it is possible that DataBlockEncoding was already changed but HFiles
> are not rewritten yet we would need a tool that can verify the content of
> hfiles in the cluster.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)