Balaji Balasubramaniam created SPARK-36720:
----------------------------------------------
Summary: On overwrite mode, setting option truncate as true
doesn't truncate the table
Key: SPARK-36720
URL: https://issues.apache.org/jira/browse/SPARK-36720
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 3.1.1
Reporter: Balaji Balasubramaniam
I'm using PySpark from AWS Glue job to write it to SAP HANA using jdbc. Our
requirement is to truncate and load data in HANA.
I've tried both of these options and on both cases, based on the stack trace,
it is trying to drop the table which is not allowed by security design.
#df_lake.write.format("jdbc").option("url", edw_jdbc_url).option("driver",
"com.sap.db.jdbc.Driver").option("dbtable", edw_jdbc_db_table).option("user",
edw_jdbc_userid).option("password", edw_jdbc_password).option("truncate",
"true").mode("append").save()
properties=\{"user": edw_jdbc_userid, "password": edw_jdbc_password,
"truncate":"true"}
df_lake.write.jdbc(url=edw_jdbc_url, table=edw_jdbc_db_table, mode='overwrite',
properties=properties)
I've verified that the schema matches. I did the jdbc read and print out the
schema as well as printing the schema from the source table.
Schema from HANA:
root
|-- RTL_ACCT_ID: long (nullable = true)
|-- FINE_DINING_PROPOSED: string (nullable = true)
|-- FINE_WINE_PROPOSED: string (nullable = true)
|-- FINE_WINE_INF_PROPOSED: string (nullable = true)
|-- GOLD_SILVER_PROPOSED: string (nullable = true)
|-- PREMIUM_PROPOSED: string (nullable = true)
|-- GSP_PROPOSED: string (nullable = true)
|-- PROPOSED_CRAFT: string (nullable = true)
|-- FW_REASON: string (nullable = true)
|-- FWI_REASON: string (nullable = true)
|-- GS_REASON: string (nullable = true)
|-- PREM_REASON: string (nullable = true)
|-- FD_REASON: string (nullable = true)
|-- CRAFT_REASON: string (nullable = true)
|-- GSP_FLAG: string (nullable = true)
|-- GSP_REASON: string (nullable = true)
|-- ELIGIBILITY: string (nullable = true)
|-- DW_LD_S: timestamp (nullable = true)
Schema from the source table:
root
|-- RTL_ACCT_ID: long (nullable = true)
|-- FINE_DINING_PROPOSED: string (nullable = true)
|-- FINE_WINE_PROPOSED: string (nullable = true)
|-- FINE_WINE_INF_PROPOSED: string (nullable = true)
|-- GOLD_SILVER_PROPOSED: string (nullable = true)
|-- PREMIUM_PROPOSED: string (nullable = true)
|-- GSP_PROPOSED: string (nullable = true)
|-- PROPOSED_CRAFT: string (nullable = true)
|-- FW_REASON: string (nullable = true)
|-- FWI_REASON: string (nullable = true)
|-- GS_REASON: string (nullable = true)
|-- PREM_REASON: string (nullable = true)
|-- FD_REASON: string (nullable = true)
|-- CRAFT_REASON: string (nullable = true)
|-- GSP_FLAG: string (nullable = true)
|-- GSP_REASON: string (nullable = true)
|-- ELIGIBILITY: string (nullable = true)
|-- DW_LD_S: timestamp (nullable = true)
This is the stack trace
py4j.protocol.Py4JJavaError: An error occurred while calling o169.jdbc.
: com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [258]:
insufficient privilege: Detailed info for this error can be found with guid
'xxxx'
at
com.sap.db.jdbc.exceptions.SQLExceptionSapDB._newInstance(SQLExceptionSapDB.java:191)
at
com.sap.db.jdbc.exceptions.SQLExceptionSapDB.newInstance(SQLExceptionSapDB.java:42)
at
com.sap.db.jdbc.packet.HReplyPacket._buildExceptionChain(HReplyPacket.java:976)
at
com.sap.db.jdbc.packet.HReplyPacket.getSQLExceptionChain(HReplyPacket.java:157)
at
com.sap.db.jdbc.packet.HPartInfo.getSQLExceptionChain(HPartInfo.java:39)
at com.sap.db.jdbc.ConnectionSapDB._receive(ConnectionSapDB.java:3476)
at com.sap.db.jdbc.ConnectionSapDB.exchange(ConnectionSapDB.java:1568)
at
com.sap.db.jdbc.StatementSapDB._executeDirect(StatementSapDB.java:1435)
at com.sap.db.jdbc.StatementSapDB._execute(StatementSapDB.java:1414)
at com.sap.db.jdbc.StatementSapDB._execute(StatementSapDB.java:1399)
at
com.sap.db.jdbc.StatementSapDB._executeUpdate(StatementSapDB.java:1387)
at com.sap.db.jdbc.StatementSapDB.executeUpdate(StatementSapDB.java:175)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.executeStatement(JdbcUtils.scala:993)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.dropTable(JdbcUtils.scala:93)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:61)
at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
at
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:817)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]