Hi Steve, I don't think I fully understand your answer. Please pardon my naiveness regarding the subject. From what I understand, the actual read will happen in the executor so executor needs access to data lake. In that sense, how do I make sure that I can programmatically pass azure credentials to the executor so that it can read data to process.
Another dilemma I have is that a user might be accessing more than one data sets within a job, lets say for joining them both. In that case, I might have two separate access tokens to read data from data lake, one for each data set. Does that make sense? Imtiaz On Sat, Aug 19, 2017 at 7:04 AM, Steve Loughran <ste...@hortonworks.com> wrote: > > On 19 Aug 2017, at 02:42, Imtiaz Ahmed <emtiazah...@gmail.com> wrote: > > Hi All, > > I am building a spark library which developers will use when writing their > spark jobs to get access to data on Azure Data Lake. But the authentication > will depend on the dataset they ask for. I need to call a rest API from > within spark job to get credentials and authenticate to read data from > ADLS. Is that even possible? I am new to spark. > E.g, from inside a spark job a user will say: > > MyCredentials myCredentials = MyLibrary.getCredentialsForPath(userId, > "/some/path/on/azure/datalake"); > > then before spark.read.json("adl://examples/src/main/resources/people.json > ") > I need to authenticate the user to be able to read that path using the > credentials fetched above. > > Any help is appreciated. > > Thanks, > Imtiaz > > > The ADL filesystem supports addDelegationTokens(); allowing the caller to > collect the delegation tokens of the current authenticated user & then pass > it along with the request —which is exactly what spark should be doing in > spark submit. > > if you want to do it yourself, look in SparkHadoopUtils (I think; IDE is > closed right now) & see how the tokens are picked up and then passed around > (marshalled over the job request, unmarshalled after & picked up, with bits > of the UserGroupInformation class doing the low level work) > > Java code snippet to write to the path tokenFile: > > FileSystem fs = FileSystem.get(conf); > Credentials cred = new Credentials(); > Token<?> tokens[] = fs.addDelegationTokens(renewer, cred); > cred.writeTokenStorageFile(tokenFile, conf); > > you can then read that file in elsewhere, and then (somehow) get the FS to > use those toakens > > otherwise, ADL supports Oauth, so you may be able to use any Oauth > libraries for this. hadoop-azure-dalalake pulls in okhttp for that, > > <dependency> > <groupId>com.squareup.okhttp</groupId> > <artifactId>okhttp</artifactId> > <version>2.4.0</version> > </dependency> > > -Steve > >