[ 
https://issues.apache.org/jira/browse/PHOENIX-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994881#comment-13994881
 ] 

Gabriel Reid commented on PHOENIX-976:
--------------------------------------

The files created by the batch load process are created by the user that you 
run the batch load process as, but HBase needs to move the created HFiles, 
meaning that the hbase user needs to have write access on the directories where 
the HFiles are created.

There are a few ways that I can think of that should work to get around this:
* Run the bulk load tool as the hbase user
* Supply a custom umask configuration parameter when running the bulk load tool 
so that the created directory will be writable by the hbase user. You can do 
this by including -Ddfs.umaskmode=000 in the parameters you supply to the tool, 
as in {code}hadoop jar phoenix-4.0.0-incubating-client.jar 
org.apache.phoenix.mapreduce.CsvBulkLoadTool -Ddfs.umaskmode=000{code}
* Temporarily turn off permissions on HDFS entirely (not a good idea in 
general, but it will make it easier to experiment with this stuff)


> bulk load issue with file permissions
> -------------------------------------
>
>                 Key: PHOENIX-976
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-976
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>         Environment: CDH 4.8.0
>            Reporter: Cristian Armaselu
>
> Created and copied a file in hdfs in /tmp/phload/customers.dat
> /tmp/phload folder permission is 777
> Executed:
> hadoop --config /etc/hadoop/conf/ jar phoenix-3.0.0-incubating-client.jar 
> org.apache.phoenix.mapreduce.CsvBulkLoadTool -libjars antlr-runtime-3.4.jar 
> --table CUSTOMERS3 --input /tmp/phload/customers.dat --output /tmp/phload/tmp
> MR completes the task successfully
> In the client loading we can see:
> 14/05/11 13:39:32 INFO mapreduce.LoadIncrementalHFiles: Trying to load 
> hfile=hdfs://localhost.localdomain:8020/tmp/phload/tmp/default/0f281fbd70e6443e82c1a559441654e3
>  first=0-customer_id 0_0 last=9-customer_id 9_0
> Then nothing moves past that point.
> In hbase region server we can see:
> 2014-05-11 13:39:32,950 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Validating hfile at 
> hdfs://localhost.localdomain:8020/tmp/phload/tmp/default/0f281fbd70e6443e82c1a559441654e3
>  for inclusion in store default region 
> CUSTOMERS3,,1399840091691.da914ff9abd642725ac5839b8787c0bb.
> 2014-05-11 13:39:32,962 INFO org.apache.hadoop.hbase.HBaseFileSystem: Rename 
> Directory, sleeping 1000 times 1
> 2014-05-11 13:39:33,964 INFO org.apache.hadoop.hbase.HBaseFileSystem: Rename 
> Directory, sleeping 1000 times 2
> 2014-05-11 13:39:35,966 INFO org.apache.hadoop.hbase.HBaseFileSystem: Rename 
> Directory, sleeping 1000 times 3
> 2014-05-11 13:39:38,969 INFO org.apache.hadoop.hbase.HBaseFileSystem: Rename 
> Directory, sleeping 1000 times 4
> 2014-05-11 13:39:42,972 INFO org.apache.hadoop.hbase.HBaseFileSystem: Rename 
> Directory, sleeping 1000 times 5
> 2014-05-11 13:39:47,975 INFO org.apache.hadoop.hbase.HBaseFileSystem: Rename 
> Directory, sleeping 1000 times 6
> 2014-05-11 13:39:53,977 INFO org.apache.hadoop.hbase.HBaseFileSystem: Rename 
> Directory, sleeping 1000 times 7
> 2014-05-11 13:40:00,980 INFO org.apache.hadoop.hbase.HBaseFileSystem: Rename 
> Directory, sleeping 1000 times 8
> 2014-05-11 13:40:08,983 INFO org.apache.hadoop.hbase.HBaseFileSystem: Rename 
> Directory, sleeping 1000 times 9
> 2014-05-11 13:40:17,987 INFO org.apache.hadoop.hbase.HBaseFileSystem: Rename 
> Directory, sleeping 1000 times 10
> 2014-05-11 13:40:27,989 WARN org.apache.hadoop.hbase.HBaseFileSystem: Rename 
> Directory, retries exhausted
> 2014-05-11 13:40:27,990 ERROR org.apache.hadoop.hbase.regionserver.HRegion: 
> There was a partial failure due to IO when attempting to load default : 
> hdfs://localhost.localdomain:8020/tmp/phload/tmp/default/0f281fbd70e6443e82c1a559441654e3
> The error is caused by hbase trying to read the folder:
> /tmp/phload/tmp/default/0f281fbd70e6443e82c1a559441654e3
> As soon as the folder permission is changed to 777 the import continues and 
> data is loaded.
> I wold have expected that as soon as I provided a 777 folder in the first 
> place (--output) everything to work smooth.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to