Christian Birch created PIG-2707:
------------------------------------
Summary: Range globs do not work
Key: PIG-2707
URL: https://issues.apache.org/jira/browse/PIG-2707
Project: Pig
Issue Type: Bug
Affects Versions: 0.9.1
Environment: Amazon Elastic MapReduce. Hadoop 0.20.205
Reporter: Christian Birch
Priority: Minor
Using e.g. 's3://foo/{14,15,16}' to load files works like a charm but neither
's3://foo/{14..16}' nor 's3://foo/{14...16}' works (I am not sure if it is two
or three dots since both fail). Anyway, I'm getting errors like this when using
ranges (no matter if it is two or three dots):
Failed Jobs:
JobId Alias Feature Message Outputs
N/A A MAP_ONLY Message:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input Pattern
s3://foo/{14...16} matches 0 files
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:282)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:999)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1016)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:172)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:934)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:887)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:887)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:861)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at
org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
Pattern s3://foo/{14...16} matches 0 files
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:270)
... 14 more
hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1508748976,
Input(s):
Failed to read data from "s3://foo/{14...16}"
Output(s):
Failed to produce result in
"hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1508748976"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
null
---
I would expect {14...16} to work just like {14,15,16}:
2012-05-17 18:29:59,098 [main] INFO org.apache.pig.tools.pigstats.ScriptState
- Pig features used in the script: UNKNOWN
2012-05-17 18:29:59,164 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File
concatenation threshold: 100 optimistic? false
2012-05-17 18:29:59,165 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-05-17 18:29:59,165 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2012-05-17 18:29:59,182 [main] INFO org.apache.pig.tools.pigstats.ScriptState
- Pig script settings are added to the job
2012-05-17 18:29:59,182 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-05-17 18:31:14,493 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2012-05-17 18:31:14,567 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2012-05-17 18:31:14,582 [Thread-30] INFO org.apache.hadoop.mapred.JobClient -
Default number of map tasks: null
2012-05-17 18:31:14,583 [Thread-30] INFO org.apache.hadoop.mapred.JobClient -
Setting default number of map tasks based on cluster size to : 8
2012-05-17 18:31:14,583 [Thread-30] INFO org.apache.hadoop.mapred.JobClient -
Default number of reduce tasks: 0
2012-05-17 18:31:15,072 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2012-05-17 18:31:16,870 [Thread-30] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to
process : 1
2012-05-17 18:31:16,870 [Thread-30] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 1
2012-05-17 18:31:18,099 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201205171523_0033
2012-05-17 18:31:18,099 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://10.53.9.207:9100/jobdetails.jsp?jobid=job_201205171523_0033
2012-05-17 18:31:58,609 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
2012-05-17 18:32:08,186 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2012-05-17 18:32:08,187 [main] INFO
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.205 0.9.1-amzn hadoop 2012-05-17 18:29:59 2012-05-17
18:32:08 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime
MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs
job_201205171523_0033 1 0 12 12 12 0 0
0 A MAP_ONLY
hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1447928118,
Input(s):
Successfully read 3 records (410 bytes) from: "s3://foo/{14,15,16}"
Output(s):
Successfully stored 3 records (1405 bytes) in:
"hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1447928118"
Counters:
Total records written : 3
Total bytes written : 1405
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201205171523_0033
---
I am not sure if this is a Pig/Hadoop-issue or an Amazon EMR/S3-issue.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira