Moritz Moeller created PIG-3246:
-----------------------------------

             Summary: not possible to use remote filesystems (S3) in a pig 
script
                 Key: PIG-3246
                 URL: https://issues.apache.org/jira/browse/PIG-3246
             Project: Pig
          Issue Type: Bug
         Environment: Apache Pig version 0.10.0-cdh4.2.0 (rexported)
Hadoop 2.0.0-cdh4.2.0
            Reporter: Moritz Moeller


My Hadoop cluster is configured using hdfs://namenode/, hdfs dfs + Pig scripts 
work fine.
Now I want to read data from S3, hdfs dfs -ls s3n://mybucket/file.csv works 
fine.
A Pig script doing LOAD 's3n://mybucket/test.csv' however fails - looks as if 
Pig is performing the LOAD request using a hdfs FileSystem.
I wasn't sure whether to mark this as bug or improvement as I do not know if 
this should be possible - but as it is a basic feature for Hadoop I guess it 
should work in Pig, too.


org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
java.net.UnknownHostException: mybucket
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
        at 
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:452)
        at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:469)
        at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
        at 
org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
        at 
org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:233)
        at java.lang.Thread.run(Thread.java:722)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:257)
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: 
sdfa
        at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
        at 
org.apache.hadoop.security.SecurityUtil.buildDTServiceName(SecurityUtil.java:295)
        at 
org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(FileSystem.java:247)
        at 
org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:468)
        at 
org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:452)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
        at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:205)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
        at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:269)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
        ... 13 more
Caused by: java.net.UnknownHostException: mybucket
        ... 25 more




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to