It looks like your Spark job was running under user root, but you file
system operation was running under user jomernik. Since Spark will call
corresponding file system(such as HDFS, S3) to commit job(rename temporary
file to persistent one), it should have correct authorization for both
Spark and file system. Could you write a Spark DataFrame to this file
system and check whether it works well?

Thanks
Yanbo

On Tue, Jun 27, 2017 at 8:47 PM, John Omernik <j...@omernik.com> wrote:

> Hello all, I am running PySpark 2.1.1 as a user, jomernik. I am working
> through some documentation here:
>
> https://spark.apache.org/docs/latest/mllib-ensembles.html#random-forests
>
> And was working on the Random Forest Classification, and found it to be
> working!  That said, when I try to save the model to my hdfs (MaprFS in my
> case)  I got a weird error:
>
> I tried to save here:
>
> model.save(sc, "maprfs:///user/jomernik/tmp/myRandomForestClassificationMo
> del")
>
> /user/jomernik is my user directory and I have full access to the
> directory.
>
>
>
> All the directories down to
>
> /user/jomernik/tmp/myRandomForestClassificationModel/metadata/_temporary/0
> are owned by my with full permissions, but when I get to this directory,
> here is the ls
>
> $ ls -ls
>
> total 1
>
> 1 drwxr-xr-x 2 root root 1 Jun 27 07:38 task_20170627123834_0019_m_000000
>
> 0 drwxr-xr-x 2 root root 0 Jun 27 07:38 _temporary
>
> Am I doing something wrong here? Why is the temp stuff owned by root? Is
> there a bug in saving things due to this ownership?
>
> John
>
>
>
>
>
>
> Exception:
> Py4JJavaError: An error occurred while calling o338.save.
> : org.apache.hadoop.security.AccessControlException: User jomernik(user
> id 1000001) does has been denied access to rename  /user/jomernik/tmp/
> myRandomForestClassificationModel/metadata/_temporary/0/
> task_20170627123834_0019_m_000000/part-00000 to /user/jomernik/tmp/
> myRandomForestClassificationModel/metadata/part-00000
> at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:1112)
> at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(
> FileOutputCommitter.java:461)
> at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(
> FileOutputCommitter.java:475)
> at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.
> commitJobInternal(FileOutputCommitter.java:392)
> at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(
> FileOutputCommitter.java:364)
> at org.apache.hadoop.mapred.FileOutputCommitter.commitJob(
> FileOutputCommitter.java:136)
> at org.apache.spark.SparkHadoopWriter.commitJob(
> SparkHadoopWriter.scala:111)
> at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1227)
> at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1168)
> at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1168)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:151)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(
> PairRDDFunctions.scala:1168)
> at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1071)
> at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1037)
> at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1037)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:151)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(
> PairRDDFunctions.scala:1037)
> at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:963)
> at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopFile$1.apply(PairRDDFunctions.scala:963)
> at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopFile$1.apply(PairRDDFunctions.scala:963)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:151)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(
> PairRDDFunctions.scala:962)
> at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.
> apply$mcV$sp(RDD.scala:1489)
> at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.
> apply(RDD.scala:1468)
> at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.
> apply(RDD.scala:1468)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:151)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1468)
> at org.apache.spark.mllib.tree.model.TreeEnsembleModel$SaveLoadV1_0$.save(
> treeEnsembleModels.scala:440)
> at org.apache.spark.mllib.tree.model.RandomForestModel.save(
> treeEnsembleModels.scala:66)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:214)
> at java.lang.Thread.run(Thread.java:745)
>

Reply via email to