[
https://issues.apache.org/jira/browse/HUDI-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089852#comment-17089852
]
Sasikumar Venkatesh commented on HUDI-773:
------------------------------------------
[~vinoth] and [~garyli1019]
My Setup is As follows.
I have a file system in ADLS2(Similar to S3 bucket) and setup OAuth based
connections
{code:java}
spark.conf.set("fs.azure.account.auth.type.<<storage-account>>.dfs.core.windows.net",
"OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<<storage-account>>.dfs.core.windows.net",
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<<storage-account>>.dfs.core.windows.net",
"**redacted")
spark.conf.set("fs.azure.account.oauth2.client.secret.<<storage-account>>.dfs.core.windows.net",
"**redacted")
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<<storage-account>>.dfs.core.windows.net",
"https://login.microsoftonline.com/<<dir-id>>/oauth2/token")
{code}
The Hudi Write is as follows,
{code:java}
val tableName = "customer"
customerDF.write.format("org.apache.hudi") .options(hudiOptions)
.mode(SaveMode.Overwrite).save("abfss://<<storage-account>>.dfs.core.windows.net/hudi-tables/customer")
{code}
I could see that When I am trying to write with Hudi Format, It is trying to
load the ADLS credentials from org.apache.hadoop.fs.azurebfs.AbfsConfiguration,
Since My Credential properties, is set to `fs.azure.account.*`.
*I am wondering is there any special config I need to add in My Spark Conf to
write to ADLS in Hudi format.*
The stack trace of the error is given below:
{code:java}
Configuration property <<storage-account>>.dfs.core.windows.net not found. at
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:392)
at
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1008)
at
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:151)
at
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:106)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:81) at
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91) at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:147)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:135)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$5.apply(SparkPlan.scala:188)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:184)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:135) at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:118)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:116)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:710)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:710)
at
org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:113)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:242)
at
org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:99)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:172)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:710)
at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:306)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:292) at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:235) at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:4)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:73)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:75)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:77)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:79)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:81)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:83)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:85)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:87)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:89)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:91)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:93)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:95)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:97)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:99)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:101)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw$$iw.<init>(command-4384834320483321:103)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw$$iw.<init>(command-4384834320483321:105)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw$$iw.<init>(command-4384834320483321:107)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$$iw.<init>(command-4384834320483321:109)
at
line8353df56311d44ef989b6a6d378b55bd92.$read.<init>(command-4384834320483321:111)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$.<init>(command-4384834320483321:115)
at
line8353df56311d44ef989b6a6d378b55bd92.$read$.<clinit>(command-4384834320483321)
at
line8353df56311d44ef989b6a6d378b55bd92.$eval$.$print$lzycompute(<notebook>:7)
at line8353df56311d44ef989b6a6d378b55bd92.$eval$.$print(<notebook>:6) at
line8353df56311d44ef989b6a6d378b55bd92.$eval.$print(<notebook>) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:793) at
scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1054) at
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:645)
at
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:644)
at
scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at
scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:644)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:576) at
scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:572) at
com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
at
com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply$mcV$sp(ScalaDriverLocal.scala:202)
at
com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
at
com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
at
com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
at
com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
at
com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
at
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:396)
at
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:373)
at
com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:238)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at
com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:233)
at
com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
at
com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:275)
at
com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
at
com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
at scala.util.Try$.apply(Try.scala:192) at
com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:639)
at
com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:485)
at
com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:597)
at
com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:390)
at
com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
at
com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
at java.lang.Thread.run(Thread.java:748)
{code}
> Hudi On Azure Data Lake Storage V2
> ----------------------------------
>
> Key: HUDI-773
> URL: https://issues.apache.org/jira/browse/HUDI-773
> Project: Apache Hudi (incubating)
> Issue Type: New Feature
> Components: Usability
> Reporter: Yanjia Gary Li
> Assignee: Yanjia Gary Li
> Priority: Minor
> Fix For: 0.6.0
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)