If are using kerberized HDFS the spark principal (or whoever is running the
cluster) has to be declared as a proxy user.

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html

Once done, you call the


val ugi =  UserGroupInformation.createProxyUser("joe",
UserGroupInformation.getLoginUser())

that user is then used to create the FS

val proxyFS = ugi.doAs( { FileSystem.newInstance(new
URI("hdfs://nn1/home/user/"), conf)  }})     /* whatever the scala syntax
is here */


The proxyFS will then do all its IO as the given user, even when done
outside a doAs clause, e.g.

proxyFS.mkdirs(new Path("/home/user/alice/"))

FileSystem.get() also works on a UGI basis, so ugi.doAs(
FileSystem.get("hdfs://nn1"))) returns a different FS instance than
FileSystem.get() outside of the clause

Once you are done with the FS, close it. If you know you are completely
done with the user across all threads, you can release them all

FileSystem.closeAllForUGI(ugi)

This closes all filesystems for that user. This is critical on long-lived
processes as otherwise you'll run out memory/threads.

On Mon, 12 Apr 2021 at 16:20, Kwangsun Noh <nohkwang...@gmail.com> wrote:

> Hi, Spark users.
>
>
> I wanted to make unknown users create HDFS files, not the OS user who
> executes the spark application.
>
>
> And I thought it would be possible using
> UserGroupInformation.createRemoteUser(“other”).doAS(…)
>
>
> However, the files are created by the OS user who launched the spark
> application in Spark Executors.
>
>
> Although I’ve tested it on Spark Standalone and Yarn, I got the same
> results.
>
>
> Is it impossible to impersonate a Spark job user using the
> UserGroupInformation.doAS?
>
>
> PS. In fact, I posted a similar question on the Spark user mailing list,
>
>        But I didn’t get the answer I wanted.
>
>
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-enable-to-use-Multiple-UGIs-in-One-Spark-Context-td39859.html
>

Reply via email to