HBase bulk load wiki page improvements
--------------------------------------

                 Key: HIVE-2590
                 URL: https://issues.apache.org/jira/browse/HIVE-2590
             Project: Hive
          Issue Type: Bug
          Components: Documentation, HBase Handler
            Reporter: Ben West
            Priority: Minor


Some suggestions on the page 
https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad which seems kind 
of out of date:

1. It seems like it's required that the number of reduce tasks in the "Sort 
Data" phase be one more than the number of keys selected in the "Range 
Partitioning" step, or else you get an error like this:


Caused by: java.lang.IllegalArgumentException: Can't read partitions file
        at 
org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:91)
        ... 15 more
Caused by: java.io.IOException: Wrong number of partitions in keyset
        at 
org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:72)
        ... 15 more

If so, it would be helpful if this was explicitly pointed out.

2. It recommends that you should use the "loadtable" ruby script to put data 
into hbase, but if you run this on newer versions of HBase (e.g. 0.90.3) it 
errors: 

    DISABLED!!!! Use completebulkload instead.  See tail of 
http://hbase.apache.org/bulk-loads.html

The instructions should probably be changed to use completebulkload instead of 
this script.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to