[jira] [Created] (HBASE-25857) HBase bulk import fails with exitCode -100

ZFabrik (Jira) Thu, 06 May 2021 01:44:09 -0700

ZFabrik created HBASE-25857:
-------------------------------

             Summary: HBase bulk import fails with exitCode -100
                 Key: HBASE-25857
                 URL: https://issues.apache.org/jira/browse/HBASE-25857
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.2.5
            Reporter: ZFabrik



I want to import the data from an HBase 1.0 cluster to our new HBase 2.2.5 
cluster.

Our setup is as follows:
 * 6 data nodes with 250GB disc space each
 * total DFS capacity (as reported by Hadoop): 1.46 TB
 * one additional node running namenode, hmaster, resource manager

The data I'm trying to import was created by HBase export on the 1.0 HBase 
cluster and take 14.9 GB in HDFS (due to `hdfs dfs -du -h`)

The import uses bulk-output with `hasLargeResult=true:`
{noformat}
> hdfs dfs -du -h /
2.6 G /TMP_IMPORT
...

> yarn jar $HBASE_HOME/lib/hbase-mapreduce-2.2.5.jar import \
         -Dmapreduce.map.speculative=false \
         -Dmapreduce.reduce.speculative=false \
         -Dimport.bulk.output=/HFILES \
         -Dimport.bulk.hasLargeResult=true \
         my_table /TMP_IMPORT

{noformat}
 
 Approximately 3 hours later the import fails with this error messages:
{noformat}
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
        at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not 
find any valid local directory for 
opt/seritrack/tt/nosql/data/yarn/usercache/seritrack/appcache/application_1620201940366_0003/output/attempt_1620201940366_0003_r_000000_1/map_40.out
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:446)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
        at 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$OnDiskMerger.merge(MergeManagerImpl.java:549)
        at 
org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
{noformat}
 
 The Yarn web UI reports this:
{noformat}
AM Container for appattempt_1620201940366_0003_000001 exited with exitCode: -100

Failing this attempt.Diagnostics: Container released on a *lost* nodeFor more 
detailed output, check the application tracking page: 
http://master:8088/cluster/app/application_1620201940366_0003 Then click on 
links to logs of each attempt.
{noformat}
Hadoop name node reports:
{noformat}
Configured Capacity: 1.46 TB
DFS Used:            34.26 GB (2.28%)
Non DFS Used:        1.33 TB <<<<< !!!
{noformat}
We can see that Yarn occupies more than 200GB on each data node (inside 
.../data/yarn/usercache/xyz/appcache), so that it uses 1.33 TB in total which 
is almost the whole capacity of 1.5 TB. There are more than 100 files on each 
data node named like `attempt_1620201940366_0003_m_000035_1_spill_37.out` each 
of which is 77MB in size.

 

So my question is, how can I use bulk import if it needs the 100x amount of 
disk space as cache?

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25857) HBase bulk import fails with exitCode -100

Reply via email to