[ https://issues.apache.org/jira/browse/HBASE-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637711#comment-13637711 ]
Yu Li commented on HBASE-5472: ------------------------------ with the attached patch, when the generated hfile includes invalid column family, the output of bulkload would be like: {panel} {color:red} 13/04/21 20:47:25 ERROR mapreduce.LoadIncrementalHFiles: Unmatched family names found, unmatched family names in hfiles to be bulkload: [CF], valid family names of table t2 are: [cf] 13/04/21 20:47:25 ERROR mapreduce.LoadIncrementalHFiles: ------------------------------------------------- Bulk load aborted with some files not yet loaded: ------------------------------------------------- hdfs://9.125.91.85:9000/testBulkload/CF/b7fadfd7d188496cae412862170ea713 Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:51) ... {color:red} Caused by: java.lang.RuntimeException: Bulkload failed because invalid family name found in bulkload target hfiles, please check your codes if the hfiles are manually generated. at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:223) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:720) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) ... {panel} My testing steps are: {noformat}1) Create a table with single column family in name of "cf" 2) HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar /opt/ibm/biginsights/hbase/hbase-VERSION.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,CF:a,CF:b \ -Dimporttsv.bulk.output=hdfs://9.125.91.85:9000/testBulkload -Dimporttsv.separator=, t2 /tmp/bulkload 3) HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar /opt/ibm/biginsights/hbase/hbase-VERSION.jar completebulkload hdfs://9.125.91.85:9000/testBulkload t2{noformat} > LoadIncrementalHFiles loops forever if the target table misses a CF > ------------------------------------------------------------------- > > Key: HBASE-5472 > URL: https://issues.apache.org/jira/browse/HBASE-5472 > Project: HBase > Issue Type: Bug > Components: mapreduce > Reporter: Lars Hofhansl > Assignee: Yu Li > Priority: Minor > Attachments: HBASE-5472-trunk.patch > > > I have some HFiles for two column families 'y','z', but I specified a target > table that only has CF 'y'. > I see the following repeated forever. > ... > 12/02/23 22:57:37 WARN mapreduce.LoadIncrementalHFiles: Attempt to bulk load > region containing into table z with files [family:y > path:hdfs://bunnypig:9000/bulk/z2/y/bd6f1c3cc8b443fc9e9e5fddcdaa3b09, > family:z > path:hdfs://bunnypig:9000/bulk/z2/z/38f12fdbb7de40e8bf0e6489ef34365d] failed. > This is recoverable and they will be retried. > 12/02/23 22:57:37 DEBUG client.MetaScanner: Scanning .META. starting at > row=z,,00000000000000 for max=2147483647 rows using > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7b7a4989 > 12/02/23 22:57:37 INFO mapreduce.LoadIncrementalHFiles: Split occured while > grouping HFiles, retry attempt 1596 with 2 files remaining to group or split > 12/02/23 22:57:37 INFO mapreduce.LoadIncrementalHFiles: Trying to load > hfile=hdfs://bunnypig:9000/bulk/z2/y/bd6f1c3cc8b443fc9e9e5fddcdaa3b09 first=r > last=r > 12/02/23 22:57:37 INFO mapreduce.LoadIncrementalHFiles: Trying to load > hfile=hdfs://bunnypig:9000/bulk/z2/z/38f12fdbb7de40e8bf0e6489ef34365d first=r > last=r > 12/02/23 22:57:37 DEBUG mapreduce.LoadIncrementalHFiles: Going to connect to > server region=z,,1330066309814.d5fa76a38c9565f614755e34eacf8316., > hostname=localhost, port=60020 for row > ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira