[jira] [Commented] (MAPREDUCE-5050) Cannot find partition.lst in Terasort on Hadoop/Local File System

Ewan Higgs (JIRA) Fri, 29 Aug 2014 02:48:08 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115061#comment-14115061
 ]


Ewan Higgs commented on MAPREDUCE-5050:
---------------------------------------

This is a duplicate of MAPREDUCE-5528.

> Cannot find partition.lst in Terasort on Hadoop/Local File System
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-5050
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5050
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: examples
>    Affects Versions: 0.20.2
>         Environment: Cloudera VM CDH3u4, VMWare, Linux, Java SE 1.6.0_31-b04
>            Reporter: Matt Parker
>            Priority: Minor
>
> I'm trying to simulate running Hadoop on Lustre by configuring it to use the 
> local file system using a single cloudera VM (cdh3u4).
> I can generate the data just fine, but when running the sorting portion of 
> the program, I get an error about not being able to find the _partition.lst 
> file. It exists in the generated data directory.
> Perusing the Terasort code, I see in the main method that has a Path 
> reference to partition.lst, which is created with the parent directory.
>   public int run(String[] args) throws Exception {
>        LOG.info("starting");
>       JobConf job = (JobConf) getConf();
> >>  Path inputDir = new Path(args[0]);
> >>  inputDir = inputDir.makeQualified(inputDir.getFileSystem(job));
> >>  Path partitionFile = new Path(inputDir, 
> >> TeraInputFormat.PARTITION_FILENAME);
>       URI partitionUri = new URI(partitionFile.toString() +
>                                "#" + TeraInputFormat.PARTITION_FILENAME);
>       TeraInputFormat.setInputPaths(job, new Path(args[0]));
>       FileOutputFormat.setOutputPath(job, new Path(args[1]));
>       job.setJobName("TeraSort");
>       job.setJarByClass(TeraSort.class);
>       job.setOutputKeyClass(Text.class);
>       job.setOutputValueClass(Text.class);
>       job.setInputFormat(TeraInputFormat.class);
>       job.setOutputFormat(TeraOutputFormat.class);
>       job.setPartitionerClass(TotalOrderPartitioner.class);
>       TeraInputFormat.writePartitionFile(job, partitionFile);
>       DistributedCache.addCacheFile(partitionUri, job);
>       DistributedCache.createSymlink(job);
>       job.setInt("dfs.replication", 1);
>       TeraOutputFormat.setFinalSync(job, true);
>       JobClient.runJob(job);
>       LOG.info("done");
>       return 0;
>   }
> But in the configure method, the Path isn't created with the parent directory 
> reference.
>     public void configure(JobConf job) {
>       try {
>         FileSystem fs = FileSystem.getLocal(job);
> >>    Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
>         splitPoints = readPartitions(fs, partFile, job);
>         trie = buildTrie(splitPoints, 0, splitPoints.length, new Text(), 2);
>       } catch (IOException ie) {
>         throw new IllegalArgumentException("can't read paritions file", ie);
>       }
>     }
> I modified the code as follows, and now sorting portion of the Terasort test 
> works using the
> general file system. I think the above code is a bug.
>     public void configure(JobConf job) {
>       try {
>         FileSystem fs = FileSystem.getLocal(job);
>   >>  Path[] inputPaths = TeraInputFormat.getInputPaths(job);
>   >>  Path partFile = new Path(inputPaths[0], 
> TeraInputFormat.PARTITION_FILENAME);
>         splitPoints = readPartitions(fs, partFile, job);
>         trie = buildTrie(splitPoints, 0, splitPoints.length, new Text(), 2);
>       } catch (IOException ie) {
>         throw new IllegalArgumentException("can't read paritions file", ie);
>       }
>     }



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5050) Cannot find partition.lst in Terasort on Hadoop/Local File System

Reply via email to