Joe McDonnell created IMPALA-8003:
-------------------------------------
Summary: Dataload should detect diskspace issues and give a
reasonable error
Key: IMPALA-8003
URL: https://issues.apache.org/jira/browse/IMPALA-8003
Project: IMPALA
Issue Type: Improvement
Components: Infrastructure
Affects Versions: Impala 3.2.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell
The minicluster dataload can fail when IMPALA_HOME runs out of disk space (or
rather, HDFS believes it has run out of unused disk space). The output from
bin/load-data.py tells people to look at the SQL logs in data_loading/sql/. The
errors that show up in the SQL logs are useless:
{noformat}
Error: Error while processing statement: FAILED: Execution Error, return code 2
from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
java.sql.SQLException: Error while processing statement: FAILED: Execution
Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask{noformat}
There is no indication of where to look or what might be wrong. This is
especially confusing because it can happen when it looks like there is a lot of
free disk space.
The actual useful information for this case is in the HDFS NameNode log, which
contains lines like:
{noformat}
3688:2018-12-18 14:59:29,594 INFO
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not enough
replicas was chosen. Reason:{NOT_ENOUGH_STORAGE_SPACE=2}
3689:2018-12-18 14:59:29,594 WARN
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to
place enough replicas, still in need of 3 to reach 3 (unavailableStorages=[],
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more
information, please enable DEBUG log level on
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and
org.apache.hadoop.net.NetworkTopology{noformat}
On failure, dataload should check to see if it hit this type of diskspace issue
and provide intelligible output.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)