This is interesting, does that mean hadoop can use the S3 as the DistributedFileSystem and EC2 machine as the Computing Node ? If so, how does the namenode communicate with the datanode (s3) ?
-----Original Message----- From: Irfan Mohammed [mailto:[email protected]] Sent: 2009年9月9日 15:03 To: [email protected] Subject: s3n intermediate storage problem Hi, I have a pig script reading/writing to S3. $ export PIG_OPTS="-Dfs.default.name=s3n://bucket_1/"; $ pig >>>> r0 = LOAD 'input2/transaction_ar20090909_14*' using PigStorage('\u0002'); r1 = FILTER r0 by client_id = 'xxxx'; store r1 into 'output2/' using PigStorage(','); >>>>> I get the following error. Looks like it is trying to write/read the intermediate data from some temporary storage in S3. I cannot find this folder under S3. 1. How do I know where it is writing or reading the intermediate files? 2. Can I use S3 urls for only load/store and the intermediate files are in the another hdfs? If so, how do I give the url paths? Thanks, Irfan java.lang.Exception: org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 GET failed for '/tmp%2Ftemp666717117%2Ftmp-2105109046%2Fpart-00000' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidRange</Code><Message>The requested range is not satisfiable</Message><ActualObjectSize>0</ActualObjectSize><RequestId>1CF939 F219CF8549</RequestId><HostId>w05Yp+WVCk2k/N9iVnqYmbFZzEiqszGYV3++yZjj+J/oaO JAifjUW4b5ZxIFDH2C</HostId><RangeRequested>bytes=0-</RangeRequested></Error> at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNat iveFileSystemStore.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocati onHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHand ler.java:59) at org.apache.hadoop.fs.s3native.$Proxy1.retrieve(Unknown Source) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek( NativeS3FileSystem.java:111) at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:7 6) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37) at org.apache.pig.backend.hadoop.datastorage.HSeekableInputStream.seek(HSeekabl eInputStream.java:64) at org.apache.pig.backend.executionengine.PigSlice.init(PigSlice.java:85) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper.ma keReader(SliceWrapper.java:127) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat. getRecordReader(PigInputFormat.java:253) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:336) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for '/tmp%2Ftemp666717117%2Ftmp-2105109046%2Fpart-00000' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidRange</Code><Message>The requested range is not satisfiable</Message><ActualObjectSize>0</ActualObjectSize><RequestId>1CF939 F219CF8549</RequestId><HostId>w05Yp+WVCk2k/N9iVnqYmbFZzEiqszGYV3++yZjj+J/oaO JAifjUW4b5ZxIFDH2C</HostId><RangeRequested>bytes=0-</RangeRequested></Error> at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3S ervice.java:424) at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestGet(RestS3S ervice.java:686) at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Se rvice.java:1558) at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Se rvice.java:1501) at org.jets3t.service.S3Service.getObject(S3Service.java:1876) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNat iveFileSystemStore.java:144) ... 17 more at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErr orMessages(Launcher.java:230) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getSta ts(Launcher.java:179) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLaunch er.launchPig(MapReduceLauncher.java:204) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExec utionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:767) at org.apache.pig.PigServer.execute(PigServer.java:760) at org.apache.pig.PigServer.access$100(PigServer.java:89) at org.apache.pig.PigServer$Graph.execute(PigServer.java:931) at org.apache.pig.PigServer.executeBatch(PigServer.java:243) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168 ) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140 ) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:307)
