Hunt Tang created SPARK-11589:
---------------------------------
Summary: Cannot create files by Hadoop FileSystem in
JavaRDD.foreach
Key: SPARK-11589
URL: https://issues.apache.org/jira/browse/SPARK-11589
Project: Spark
Issue Type: Bug
Affects Versions: 1.5.1
Reporter: Hunt Tang
Priority: Blocker
I'm using Hadoop 2.6.0, Spark 1.5.1.
I wanna output zip files by using Hadoop DistributedFileSystem in
JavaRDD.foreach, the sample code is as followings. The code runs normally (both
test code 1 and 2) if I set master to local mode, however, when I set it to
yarn mode (no matter yarn-client or yarn-cluster), the files of test code 2
could not be successfully created, and no any error log printed.
{code:java}
Configuration fileSystemConf = new Configuration();
fileSystemConf.set("fs.defaultFS", "hdfs://myHostname:9000");
// Test code 1
FileSystem fsTemp = FileSystem.get(fileSystemConf);
FSDataOutputStream fosTemp = fsTemp.create(new Path(output +
"test.zip"), true);
ZipOutputStream zosTemp = new ZipOutputStream(fosTemp);
zosTemp.putNextEntry(new ZipEntry("task.json"));
zosTemp.write(new byte[1]);
zosTemp.close();
fosTemp.close();
// Test code 2
JavaPairRDD<Integer, Iterable<String>> packageImageIdGroup =
packageImageIdsMap.groupByKey();
packageImageIdGroup.foreach((packageImageIdsPair) -> {
String packageName = String.format("%03d",
packageImageIdsPair._1());
String filename = output + packageName + ".zip";
Iterable<String> packageImageIds = packageImageIdsPair._2();
FileSystem fs = FileSystem.get(fileSystemConf);
FSDataOutputStream fos = fs.create(new Path(filename), true);
ZipOutputStream zos = new ZipOutputStream(fos);
for (String imageId : packageImageIds) {
String imageFilename = packageName + "/Image/" + imageId +
".jpg";
zos.putNextEntry(new ZipEntry(imageFilename));
zos.write(new byte[1]);
}
zos.close();
fos.close();
});
{code}
I know I should put some streaming instances into try(), but please disregard
this for now.
Is there any clue why the code could not work in yarn mode? I'll very
appreciate if someone can give help! Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]