[
https://issues.apache.org/jira/browse/PIG-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai resolved PIG-2707.
-----------------------------
Resolution: Won't Fix
Try s3n://foo/1[4-6].
Pig mostly rely on Hadoop to parse the globbing (Except for Pig additionally
takes a comma separated file list). And .. is not a valid Hadoop globbing
syntax.
> Range globs do not work
> -----------------------
>
> Key: PIG-2707
> URL: https://issues.apache.org/jira/browse/PIG-2707
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.9.1
> Environment: Amazon Elastic MapReduce. Hadoop 0.20.205
> Reporter: Christian Birch
> Priority: Minor
>
> Using e.g. 's3://foo/{14,15,16}' to load files works like a charm but neither
> 's3://foo/{14..16}' nor 's3://foo/{14...16}' works (I am not sure if it is
> two or three dots since both fail). Anyway, I'm getting errors like this when
> using ranges (no matter if it is two or three dots):
> Failed Jobs:
> JobId Alias Feature Message Outputs
> N/A A MAP_ONLY Message:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input
> Pattern s3://foo/{14...16} matches 0 files
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:282)
> at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:999)
> at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1016)
> at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:172)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:934)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:887)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:887)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:861)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
> at
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
> Pattern s3://foo/{14...16} matches 0 files
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:270)
> ... 14 more
> hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1508748976,
> Input(s):
> Failed to read data from "s3://foo/{14...16}"
> Output(s):
> Failed to produce result in
> "hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1508748976"
> Counters:
> Total records written : 0
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> Job DAG:
> null
> ---
> I would expect {14...16} to work just like {14,15,16}:
> 2012-05-17 18:29:59,098 [main] INFO
> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script:
> UNKNOWN
> 2012-05-17 18:29:59,164 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> File concatenation threshold: 100 optimistic? false
> 2012-05-17 18:29:59,165 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2012-05-17 18:29:59,165 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2012-05-17 18:29:59,182 [main] INFO
> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to
> the job
> 2012-05-17 18:29:59,182 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-05-17 18:31:14,493 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2012-05-17 18:31:14,567 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2012-05-17 18:31:14,582 [Thread-30] INFO org.apache.hadoop.mapred.JobClient
> - Default number of map tasks: null
> 2012-05-17 18:31:14,583 [Thread-30] INFO org.apache.hadoop.mapred.JobClient
> - Setting default number of map tasks based on cluster size to : 8
> 2012-05-17 18:31:14,583 [Thread-30] INFO org.apache.hadoop.mapred.JobClient
> - Default number of reduce tasks: 0
> 2012-05-17 18:31:15,072 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2012-05-17 18:31:16,870 [Thread-30] INFO
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to
> process : 1
> 2012-05-17 18:31:16,870 [Thread-30] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
> paths (combined) to process : 1
> 2012-05-17 18:31:18,099 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_201205171523_0033
> 2012-05-17 18:31:18,099 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - More information at:
> http://10.53.9.207:9100/jobdetails.jsp?jobid=job_201205171523_0033
> 2012-05-17 18:31:58,609 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 50% complete
> 2012-05-17 18:32:08,186 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2012-05-17 18:32:08,187 [main] INFO
> org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
> HadoopVersion PigVersion UserId StartedAt FinishedAt Features
> 0.20.205 0.9.1-amzn hadoop 2012-05-17 18:29:59 2012-05-17
> 18:32:08 UNKNOWN
> Success!
> Job Stats (time in seconds):
> JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime
> MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs
> job_201205171523_0033 1 0 12 12 12 0 0
> 0 A MAP_ONLY
> hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1447928118,
> Input(s):
> Successfully read 3 records (410 bytes) from: "s3://foo/{14,15,16}"
> Output(s):
> Successfully stored 3 records (1405 bytes) in:
> "hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1447928118"
> Counters:
> Total records written : 3
> Total bytes written : 1405
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> Job DAG:
> job_201205171523_0033
> ---
> I am not sure if this is a Pig/Hadoop-issue or an Amazon EMR/S3-issue.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira