Hunt Tang created SPARK-11589:
---------------------------------

             Summary: Cannot create files by Hadoop FileSystem in 
JavaRDD.foreach
                 Key: SPARK-11589
                 URL: https://issues.apache.org/jira/browse/SPARK-11589
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.5.1
            Reporter: Hunt Tang
            Priority: Blocker


I'm using Hadoop 2.6.0, Spark 1.5.1.
I wanna output zip files by using Hadoop DistributedFileSystem in 
JavaRDD.foreach, the sample code is as followings. The code runs normally (both 
test code 1 and 2) if I set master to local mode, however, when I set it to 
yarn mode (no matter yarn-client or yarn-cluster), the files of test code 2 
could not be successfully created, and no any error log printed.

{code:java}
        Configuration fileSystemConf = new Configuration();
        fileSystemConf.set("fs.defaultFS", "hdfs://myHostname:9000");

        // Test code 1
        FileSystem fsTemp = FileSystem.get(fileSystemConf);
        FSDataOutputStream fosTemp = fsTemp.create(new Path(output + 
"test.zip"), true);
        ZipOutputStream zosTemp = new ZipOutputStream(fosTemp);
        zosTemp.putNextEntry(new ZipEntry("task.json"));
        zosTemp.write(new byte[1]);
        zosTemp.close();
        fosTemp.close();

        // Test code 2
        JavaPairRDD<Integer, Iterable<String>> packageImageIdGroup = 
packageImageIdsMap.groupByKey();
        packageImageIdGroup.foreach((packageImageIdsPair) -> {
            String packageName = String.format("%03d", 
packageImageIdsPair._1());
            String filename = output + packageName + ".zip";
            Iterable<String> packageImageIds = packageImageIdsPair._2();

            FileSystem fs = FileSystem.get(fileSystemConf);
            FSDataOutputStream fos = fs.create(new Path(filename), true);
            ZipOutputStream zos = new ZipOutputStream(fos);

            for (String imageId : packageImageIds) {
                String imageFilename = packageName + "/Image/" + imageId + 
".jpg";
                zos.putNextEntry(new ZipEntry(imageFilename));
                zos.write(new byte[1]);
            }

            zos.close();
            fos.close();
        });
{code}

I know I should put some streaming instances into try(), but please disregard 
this for now.

Is there any clue why the code could not work in yarn mode? I'll very 
appreciate if someone can give help! Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to