[ https://issues.apache.org/jira/browse/HBASE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621166#comment-13621166 ]
stack commented on HBASE-7704: ------------------------------ Here is more on how I think this would work (if anyone is listening). We want a script that we can run that goes through all the hfiles in an hbase install and returns code 0 if no v1 files found and non-zero if any v1 files are found (It should fail out as soon as it finds a v1 file). This script can be run while an hbase cluster is up. Script should be called something like v1Free or noV1HFiles Script can be in java or jruby since a one-off just needed to check an hbase install BEFORE we do an upgrade. Script should implement http://hadoop.apache.org/docs/r2.0.3-alpha/api/org/apache/hadoop/util/Tool.html so it can pick up configuration to find hbase to run against (I suppose this means it is a java script). Ideally, script should be done in a manner such that if we need to use it in a mapreduce job, it'd be easy to do (we do not need the mapreduce job as part of this JIRA I would say). I think this means that in the script there is a method which takes fully qualified hfile Path and returns true/false. Would suggest that script have an executor service and take on the command line how many concurrent threads to run w/ a reasonable default so that the checking is done in parallel. So, the script would walk the hbase.rootdir looking for hfiles. Look at hbase file utils because will need to skip over special files and directories. Per file found, it would read in its metadata and check for v1. See hfile for how it finds metadata at end of file. > migration tool that checks presence of HFile V1 files > ----------------------------------------------------- > > Key: HBASE-7704 > URL: https://issues.apache.org/jira/browse/HBASE-7704 > Project: HBase > Issue Type: Task > Reporter: Ted Yu > Priority: Blocker > Fix For: 0.95.1 > > > Below was Stack's comment from HBASE-7660: > Regards the migration 'tool', or 'tool' to check for presence of v1 files, I > imagine it as an addition to the hfile tool > http://hbase.apache.org/book.html#hfile_tool2 The hfile tool already takes a > bunch of args including printing out meta. We could add an option to print > out version only – or return 1 if version 1 or some such – and then do a bit > of code to just list all hfiles and run this script against each. Could MR it > if too many files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira