[jira] [Resolved] (PIG-2707) Range globs do not work

Daniel Dai (JIRA) Tue, 22 May 2012 16:28:42 -0700

     [ 
https://issues.apache.org/jira/browse/PIG-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Daniel Dai resolved PIG-2707.
-----------------------------

    Resolution: Won't Fix

Try s3n://foo/1[4-6].

Pig mostly rely on Hadoop to parse the globbing (Except for Pig additionally 
takes a comma separated file list). And .. is not a valid Hadoop globbing 
syntax.
                
> Range globs do not work
> -----------------------
>
>                 Key: PIG-2707
>                 URL: https://issues.apache.org/jira/browse/PIG-2707
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.1
>         Environment: Amazon Elastic MapReduce. Hadoop 0.20.205
>            Reporter: Christian Birch
>            Priority: Minor
>
> Using e.g. 's3://foo/{14,15,16}' to load files works like a charm but neither 
> 's3://foo/{14..16}' nor 's3://foo/{14...16}' works (I am not sure if it is 
> two or three dots since both fail). Anyway, I'm getting errors like this when 
> using ranges (no matter if it is two or three dots):
> Failed Jobs:
> JobId Alias   Feature Message Outputs
> N/A   A       MAP_ONLY        Message: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input 
> Pattern s3://foo/{14...16} matches 0 files
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:282)
>       at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:999)
>       at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1016)
>       at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:172)
>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:934)
>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:887)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>       at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:887)
>       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:861)
>       at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
>       at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>       at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>       at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input 
> Pattern s3://foo/{14...16} matches 0 files
>       at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
>       at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:270)
>       ... 14 more
>       hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1508748976,
> Input(s):
> Failed to read data from "s3://foo/{14...16}"
> Output(s):
> Failed to produce result in 
> "hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1508748976"
> Counters:
> Total records written : 0
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> Job DAG:
> null
> ---
> I would expect {14...16} to work just like {14,15,16}:
> 2012-05-17 18:29:59,098 [main] INFO  
> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: 
> UNKNOWN
> 2012-05-17 18:29:59,164 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - 
> File concatenation threshold: 100 optimistic? false
> 2012-05-17 18:29:59,165 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size before optimization: 1
> 2012-05-17 18:29:59,165 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size after optimization: 1
> 2012-05-17 18:29:59,182 [main] INFO  
> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to 
> the job
> 2012-05-17 18:29:59,182 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-05-17 18:31:14,493 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>  - Setting up single store job
> 2012-05-17 18:31:14,567 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 1 map-reduce job(s) waiting for submission.
> 2012-05-17 18:31:14,582 [Thread-30] INFO  org.apache.hadoop.mapred.JobClient 
> - Default number of map tasks: null
> 2012-05-17 18:31:14,583 [Thread-30] INFO  org.apache.hadoop.mapred.JobClient 
> - Setting default number of map tasks based on cluster size to : 8
> 2012-05-17 18:31:14,583 [Thread-30] INFO  org.apache.hadoop.mapred.JobClient 
> - Default number of reduce tasks: 0
> 2012-05-17 18:31:15,072 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 0% complete
> 2012-05-17 18:31:16,870 [Thread-30] INFO  
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
> process : 1
> 2012-05-17 18:31:16,870 [Thread-30] INFO  
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input 
> paths (combined) to process : 1
> 2012-05-17 18:31:18,099 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - HadoopJobId: job_201205171523_0033
> 2012-05-17 18:31:18,099 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - More information at: 
> http://10.53.9.207:9100/jobdetails.jsp?jobid=job_201205171523_0033
> 2012-05-17 18:31:58,609 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 50% complete
> 2012-05-17 18:32:08,186 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 100% complete
> 2012-05-17 18:32:08,187 [main] INFO  
> org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 
> HadoopVersion PigVersion      UserId  StartedAt       FinishedAt      Features
> 0.20.205      0.9.1-amzn      hadoop  2012-05-17 18:29:59     2012-05-17 
> 18:32:08     UNKNOWN
> Success!
> Job Stats (time in seconds):
> JobId Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201205171523_0033 1       0       12      12      12      0       0       
> 0       A       MAP_ONLY        
> hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1447928118,
> Input(s):
> Successfully read 3 records (410 bytes) from: "s3://foo/{14,15,16}"
> Output(s):
> Successfully stored 3 records (1405 bytes) in: 
> "hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1447928118"
> Counters:
> Total records written : 3
> Total bytes written : 1405
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> Job DAG:
> job_201205171523_0033
> ---
> I am not sure if this is a Pig/Hadoop-issue or an Amazon EMR/S3-issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-2707) Range globs do not work

Reply via email to