[ 
https://issues.apache.org/jira/browse/HBASE-20649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535476#comment-16535476
 ] 

Sean Busbey commented on HBASE-20649:
-------------------------------------

okay, I think this can work. We just need to add some more info to the section 
explaining how to interpret the output. Before we push forward on this, folks 
should read through and see if we're asking too much of operators.

On my test cluster I made PREFIX_TREE table, inserted data, flushed it, 
snapshot it, cloned the snapshot, then altered both tables to change the dbe to 
something other than PREFIX_TREE. Then I started from the assumption of not 
knowing that had happened and relying on the pre-upgrade tool to figure out how 
to make things work.

Each iteration I ran the same command: {{hbase --config /etc/hbase/conf 
pre-upgrade validate-hfile}}

h3. first run

Tool complains about the file in {{example}} table, the first flush. Here's the 
output
{code}

18/07/06 15:46:33 WARN hbck.HFileCorruptionChecker: Found corrupt HFile 
hdfs://busbey-hbase-20649-1.example.com:8020/hbase/data/default/example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
Trailer from file 
hdfs://busbey-hbase-20649-1.example.com:8020/hbase/data/default/example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea
        at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:545)
        at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:611)
        at 
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkHFile(HFileCorruptionChecker.java:101)
        at 
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkColFamDir(HFileCorruptionChecker.java:185)
        at 
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkRegionDir(HFileCorruptionChecker.java:323)
        at 
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:408)
        at 
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:399)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Invalid data block encoding type in file info: 
PREFIX_TREE
        at 
org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.createFromFileInfo(HFileDataBlockEncoderImpl.java:58)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.<init>(HFileReaderImpl.java:246)
        at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:538)
        ... 14 more
Caused by: java.lang.IllegalArgumentException: No enum constant 
org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.PREFIX_TREE
        at java.lang.Enum.valueOf(Enum.java:238)
        at 
org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.valueOf(DataBlockEncoding.java:31)
        at 
org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.createFromFileInfo(HFileDataBlockEncoderImpl.java:56)
        ... 16 more
18/07/06 15:46:33 INFO tool.HFileContentValidator: Validating HFile contents 
under hdfs://busbey-hbase-20649-1.example.com:8020/hbase/archive
18/07/06 15:46:33 WARN tool.HFileContentValidator: Corrupt file: 
hdfs://busbey-hbase-20649-1.example.com:8020/hbase/data/default/example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea
18/07/06 15:46:33 WARN tool.HFileContentValidator: There are 1 corrupted 
HFiles. Change data block encodings before upgrading. Check 
https://s.apache.org/prefixtree for instructions.
{code}

I think given the path {{/hbase/data/default/example/}} it's straight forward 
to reason "I need to do a major compaction of the example table". So I did that.

h3. second run

The tool complains about the same file, but this time it's in the archive 
directory.

{code}

18/07/06 15:50:42 INFO tool.HFileContentValidator: Validating HFile contents 
under hdfs://busbey-hbase-20649-1.example.com:8020/hbase/archive
18/07/06 15:50:42 WARN hbck.HFileCorruptionChecker: Found corrupt HFile 
hdfs://busbey-hbase-20649-1.example.com:8020/hbase/archive/data/default/example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
Trailer from file 
hdfs://busbey-hbase-20649-1.example.com:8020/hbase/archive/data/default/example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea
        at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:545)
        at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:611)
        at 
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkHFile(HFileCorruptionChecker.java:101)
        at 
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkColFamDir(HFileCorruptionChecker.java:185)
        at 
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkRegionDir(HFileCorruptionChecker.java:323)
        at 
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:408)
        at 
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:399)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Invalid data block encoding type in file info: 
PREFIX_TREE
        at 
org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.createFromFileInfo(HFileDataBlockEncoderImpl.java:58)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.<init>(HFileReaderImpl.java:246)
        at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:538)
        ... 14 more
Caused by: java.lang.IllegalArgumentException: No enum constant 
org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.PREFIX_TREE
        at java.lang.Enum.valueOf(Enum.java:238)
        at 
org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.valueOf(DataBlockEncoding.java:31)
        at 
org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.createFromFileInfo(HFileDataBlockEncoderImpl.java:56)
        ... 16 more
18/07/06 15:50:42 WARN tool.HFileContentValidator: Corrupt file: 
hdfs://busbey-hbase-20649-1.example.com:8020/hbase/archive/data/default/example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea
18/07/06 15:50:42 WARN tool.HFileContentValidator: There are 1 corrupted 
HFiles. Change data block encodings before upgrading. Check 
https://s.apache.org/prefixtree for instructions.
{code}

This is where I think we need more details in the ref guide. I just major 
compacted the example table, so why is this still in the archive directory? can 
I delete it?

Let's start with "check if any tables have an hreference to the file because of 
cloning". Given the file base name in the output 
({{bfc569db5fa543f5ba69bab594a85cea}}) we can use HDFS' find tool to get it.

{code}
hdfs dfs -find /hbase/data -name '*-bfc569db5fa543f5ba69bab594a85cea'
/hbase/data/default/clone_of_example/24482ef2073ababe2f4587af2eccb6cd/f1/example=624357cffd1fae4422663c98155de45b-bfc569db5fa543f5ba69bab594a85cea
{code}

The above output means the "clone_of_example" table in the default namespace is 
referencing the file that got flagged. (if the example table hadn't been 
compacted then the file would be within the encoded region name that comes 
between the "=" and the "-"). With a little more effort we could make an 
example bash invocation to pull out the region/file combos from the tool output 
and iterate over doing hdfs finds.

In this case, I did a major compaction of "clone_of_example".

h3. third run

same output as the second run, so it's hard to tell if we've made progress or 
not.

"What if the file is referenced from a snapshot?"

We can iterate over every snapshot and look at its files for the one that got 
flagged. Like before, this example just takes the one file name instead of 
trying to parse the tool output and iterate over them.

{code}
$ for snapshot in $(hbase snapshotinfo -list-snapshots 2>/dev/null | tail -n -1 
| cut -f 1 -d \|); do echo "checking snapshot named '${snapshot}'"; hbase 
snapshotinfo -snapshot "${snapshot}" -files 2>/dev/null | grep 
bfc569db5fa543f5ba69bab594a85cea; done
checking snapshot named 'some_snapshot'
   1.1 K 
example/624357cffd1fae4422663c98155de45b/f1/bfc569db5fa543f5ba69bab594a85cea 
(archive)
{code}

Found it! From here I can just delete the snapshot. Alternatively I can "clean" 
it by cloning it, major compacting the resultant table, making a new snapshot, 
and deleting the original. I did the latter.

{code}
hbase(main):001:0> create_namespace 'pre_upgrade_cleanup'
0 row(s) in 0.1090 seconds

hbase(main):002:0> clone_snapshot 'some_snapshot', 
'pre_upgrade_cleanup:some_snapshot'
0 row(s) in 0.4880 seconds

hbase(main):003:0> describe 'pre_upgrade_cleanup:some_snapshot'
Table pre_upgrade_cleanup:some_snapshot is ENABLED
pre_upgrade_cleanup:some_snapshot
COLUMN FAMILIES DESCRIPTION
{NAME => 'f1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'PREFIX_TREE', TTL => 
'FOREVER
', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE 
=> '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.0270 seconds

hbase(main):004:0> alter 'pre_upgrade_cleanup:some_snapshot', { NAME => 'f1', 
DATA_BLOCK_ENCODING => 'FAST_DIFF' }
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9830 seconds

hbase(main):005:0> major_compact 'pre_upgrade_cleanup:some_snapshot'
0 row(s) in 0.0600 seconds
hbase(main):006:0> delete_snapshot 'some_snapshot'
0 row(s) in 0.0250 seconds
hbase(main):007:0> snapshot 'pre_upgrade_cleanup:some_snapshot', 'some_snapshot'
0 row(s) in 0.3410 seconds
hbase(main):008:0> disable 'pre_upgrade_cleanup:some_snapshot'
0 row(s) in 2.3790 seconds
hbase(main):009:0> drop 'pre_upgrade_cleanup:some_snapshot'
0 row(s) in 1.2540 seconds

hbase(main):010:0> drop_namespace 'pre_upgrade_cleanup'
0 row(s) in 0.0280 seconds

hbase(main):011:0> exit

{code}

h3. run four

Same output as the prior two runs. This time checking for hfile references and 
snapshots shows nothing.

{code}
$ hdfs dfs -find /hbase/data -name '*-bfc569db5fa543f5ba69bab594a85cea'
$ for snapshot in $(hbase snapshotinfo -list-snapshots 2>/dev/null | tail -n -1 
| cut -f 1 -d \|); do echo "checking snapshot named '${snapshot}'"; hbase 
snapshotinfo -snapshot "${snapshot}" -files 2>/dev/null | grep 
bfc569db5fa543f5ba69bab594a85cea; done
checking snapshot named 'some_snapshot'
{code}

So I waited until longer than the hfile cleaning period because I forgot what 
to look for in the master log to see if it ran.

h3. run five

All clear.

{code}
18/07/06 16:30:39 INFO tool.HFileContentValidator: Validating HFile contents 
under hdfs://busbey-hbase-20649-1.example.com:8020/hbase/archive
18/07/06 16:30:39 INFO tool.HFileContentValidator: There are no incompatible 
HFiles under hdfs://busbey-hbase-20649-1.example.com:8020/hbase.
{code}


> Validate HFiles do not have PREFIX_TREE DataBlockEncoding
> ---------------------------------------------------------
>
>                 Key: HBASE-20649
>                 URL: https://issues.apache.org/jira/browse/HBASE-20649
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Peter Somogyi
>            Assignee: Balazs Meszaros
>            Priority: Minor
>             Fix For: 3.0.0
>
>         Attachments: HBASE-20649.master.001.patch, 
> HBASE-20649.master.002.patch, HBASE-20649.master.003.patch, 
> HBASE-20649.master.004.patch
>
>
> HBASE-20592 adds a tool to check column families on the cluster do not have 
> PREFIX_TREE encoding.
> Since it is possible that DataBlockEncoding was already changed but HFiles 
> are not rewritten yet we would need a tool that can verify the content of 
> hfiles in the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to