[
https://issues.apache.org/jira/browse/HBASE-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637711#comment-13637711
]
Yu Li commented on HBASE-5472:
------------------------------
with the attached patch, when the generated hfile includes invalid column
family, the output of bulkload would be like:
{panel}
{color:red} 13/04/21 20:47:25 ERROR mapreduce.LoadIncrementalHFiles: Unmatched
family names found, unmatched family names in hfiles to be bulkload: [CF],
valid family names of table t2 are: [cf]
13/04/21 20:47:25 ERROR mapreduce.LoadIncrementalHFiles:
-------------------------------------------------
Bulk load aborted with some files not yet loaded:
-------------------------------------------------
hdfs://9.125.91.85:9000/testBulkload/CF/b7fadfd7d188496cae412862170ea713
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:51)
...
{color:red} Caused by: java.lang.RuntimeException: Bulkload failed because
invalid family name found in bulkload target hfiles, please check your codes if
the hfiles are manually generated.
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:223)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:720)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
...
{panel}
My testing steps are:
{noformat}1) Create a table with single column family in name of "cf"
2) HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
${HADOOP_HOME}/bin/hadoop jar /opt/ibm/biginsights/hbase/hbase-VERSION.jar
importtsv -Dimporttsv.columns=HBASE_ROW_KEY,CF:a,CF:b \
-Dimporttsv.bulk.output=hdfs://9.125.91.85:9000/testBulkload
-Dimporttsv.separator=, t2 /tmp/bulkload
3) HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
${HADOOP_HOME}/bin/hadoop jar /opt/ibm/biginsights/hbase/hbase-VERSION.jar
completebulkload hdfs://9.125.91.85:9000/testBulkload t2{noformat}
> LoadIncrementalHFiles loops forever if the target table misses a CF
> -------------------------------------------------------------------
>
> Key: HBASE-5472
> URL: https://issues.apache.org/jira/browse/HBASE-5472
> Project: HBase
> Issue Type: Bug
> Components: mapreduce
> Reporter: Lars Hofhansl
> Assignee: Yu Li
> Priority: Minor
> Attachments: HBASE-5472-trunk.patch
>
>
> I have some HFiles for two column families 'y','z', but I specified a target
> table that only has CF 'y'.
> I see the following repeated forever.
> ...
> 12/02/23 22:57:37 WARN mapreduce.LoadIncrementalHFiles: Attempt to bulk load
> region containing into table z with files [family:y
> path:hdfs://bunnypig:9000/bulk/z2/y/bd6f1c3cc8b443fc9e9e5fddcdaa3b09,
> family:z
> path:hdfs://bunnypig:9000/bulk/z2/z/38f12fdbb7de40e8bf0e6489ef34365d] failed.
> This is recoverable and they will be retried.
> 12/02/23 22:57:37 DEBUG client.MetaScanner: Scanning .META. starting at
> row=z,,00000000000000 for max=2147483647 rows using
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7b7a4989
> 12/02/23 22:57:37 INFO mapreduce.LoadIncrementalHFiles: Split occured while
> grouping HFiles, retry attempt 1596 with 2 files remaining to group or split
> 12/02/23 22:57:37 INFO mapreduce.LoadIncrementalHFiles: Trying to load
> hfile=hdfs://bunnypig:9000/bulk/z2/y/bd6f1c3cc8b443fc9e9e5fddcdaa3b09 first=r
> last=r
> 12/02/23 22:57:37 INFO mapreduce.LoadIncrementalHFiles: Trying to load
> hfile=hdfs://bunnypig:9000/bulk/z2/z/38f12fdbb7de40e8bf0e6489ef34365d first=r
> last=r
> 12/02/23 22:57:37 DEBUG mapreduce.LoadIncrementalHFiles: Going to connect to
> server region=z,,1330066309814.d5fa76a38c9565f614755e34eacf8316.,
> hostname=localhost, port=60020 for row
> ...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira