AccessControl exception thrown by estimateNumberOfReducers if data includes
unreadable subdirectories
-----------------------------------------------------------------------------------------------------
Key: PIG-2386
URL: https://issues.apache.org/jira/browse/PIG-2386
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.9.1
Reporter: Adam Portley
Priority: Minor
Pig estimates the number of reducers based on input data size. The code to
calculate the input size throws an exception if the data contains any
unreadable subdirectories (perhaps subsets of the data with restricted read
permissions):
Caused by: org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=<removed>, access=READ_EXECUTE, inode="secure":owner:secure:rwxr-x---
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:669)
at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:280)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getPathLength(JobControlCompiler.java:791)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getPathLength(JobControlCompiler.java:794)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:779)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:739)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:587)
... 12 more
Pig should catch this exception and ignore unreadable directories when
calculating the input size.
Users can work around the issue by specifying default_parallel or PARALLEL.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira