[ 
https://issues.apache.org/jira/browse/SPARK-31099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056549#comment-17056549
 ] 

Kris Mok commented on SPARK-31099:
----------------------------------

Just documenting the fact that users may encounter migration issues when 
upgrading from earlier versions of Spark to Spark 3.0 due to the Hive profile 
upgrade sounds good to me.

Derby migration is unlikely to be a production issue, and for other databases 
(MySQL / PG etc) they’re heavy enough that folks would probably realize it’s a 
Hive metastore migration issue just like what’d happen in Hive.

But the documentation should at the very least describe:
* upgraded Hive profile
* what kind of error messages could occur
* links to Hive documentation of how to perform the upgrade

WDYT?

> Create migration script for metastore_db
> ----------------------------------------
>
>                 Key: SPARK-31099
>                 URL: https://issues.apache.org/jira/browse/SPARK-31099
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Gengliang Wang
>            Priority: Major
>
> When an existing Derby database exists (in ./metastore_db) created by Hive 
> 1.2.x profile, it'll fail to upgrade itself to the Hive 2.3.x profile.
> Repro steps:
> 1. Build OSS or DBR master with SBT with -Phive-1.2 -Phive 
> -Phive-thriftserver. Make sure there's no existing ./metastore_db directory 
> in the repo.
> 2. Run bin/spark-shell, and then spark.sql("show databases"). This will 
> populate the ./metastore_db directory, where the Derby-based metastore 
> database is hosted. This database is populated from Hive 1.2.x.
> 3. Re-build OSS or DBR master with SBT with -Phive -Phive-thriftserver (drops 
> the Hive 1.2 profile, which makes it use the default Hive 2.3 profile)
> 4. Repeat Step (2) above. This will trigger Hive 2.3.x to load the Derby 
> database created in Step (2), which triggers an upgrade step, and that's 
> where the following error will be reported.
> 5. Delete the ./metastore_db and re-run Step (4). The error is no longer 
> reported.
> {code:java}
> 20/03/09 13:57:04 ERROR Datastore: Error thrown executing ALTER TABLE TBLS 
> ADD IS_REWRITE_ENABLED CHAR(1) NOT NULL CHECK (IS_REWRITE_ENABLED IN 
> ('Y','N')) : In an ALTER TABLE statement, the column 'IS_REWRITE_ENABLED' has 
> been specified as NOT NULL and either the DEFAULT clause was not specified or 
> was specified as DEFAULT NULL.
> java.sql.SQLSyntaxErrorException: In an ALTER TABLE statement, the column 
> 'IS_REWRITE_ENABLED' has been specified as NOT NULL and either the DEFAULT 
> clause was not specified or was specified as DEFAULT NULL.
>       at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>       at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>       at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:879)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:830)
>       at 
> org.datanucleus.store.rdbms.table.TableImpl.validateColumns(TableImpl.java:257)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3398)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896)
>       at 
> org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672)
>       at 
> org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:425)
>       at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:865)
>       at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:347)
>       at org.datanucleus.store.query.Query.executeQuery(Query.java:1816)
>       at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744)
>       at org.datanucleus.store.query.Query.execute(Query.java:1726)
>       at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:374)
>       at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216)
>       at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:184)
>       at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.<init>(MetaStoreDirectSql.java:144)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:410)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303)
>       at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
>       at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
>       at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
>       at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:164)
>       at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3600)
>       at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3652)
>       at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3632)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3894)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
>       at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:388)
>       at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332)
>       at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312)
>       at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.client(HiveClientImpl.scala:343)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:369)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$retryLocked$1(HiveClientImpl.scala:280)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:316)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:272)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:359)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:472)
>       at 
> org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$databaseExists$1(PoolingHiveClient.scala:267)
>       at 
> org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$databaseExists$1$adapted(PoolingHiveClient.scala:266)
>       at 
> org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:112)
>       at 
> org.apache.spark.sql.hive.client.PoolingHiveClient.databaseExists(PoolingHiveClient.scala:266)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:286)
>       at 
> scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:145)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:106)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:144)
>       at 
> com.databricks.spark.util.NoopProgressReporter$.withStatusCode(ProgressReporter.scala:52)
>       at 
> com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:143)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:286)
>       at 
> org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:212)
>       at 
> org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:199)
>       at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:47)
>       at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:62)
>       at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:94)
>       at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:94)
>       at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:270)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:191)
>       at 
> org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:43)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45)
>       at 
> org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:231)
>       at 
> org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3612)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:115)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:246)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:100)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:76)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:196)
>       at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3610)
>       at org.apache.spark.sql.Dataset.<init>(Dataset.scala:231)
>       at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
>       at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
>       at 
> org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:662)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
>       at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:657)
>       at 
> $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:24)
>       at 
> $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:28)
>       at 
> $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30)
>       at 
> $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:32)
>       at $line50594476574342420814.$read$$iw$$iw$$iw$$iw.<init>(<console>:34)
>       at $line50594476574342420814.$read$$iw$$iw$$iw.<init>(<console>:36)
>       at $line50594476574342420814.$read$$iw$$iw.<init>(<console>:38)
>       at $line50594476574342420814.$read$$iw.<init>(<console>:40)
>       at $line50594476574342420814.$read.<init>(<console>:42)
>       at $line50594476574342420814.$read$.<init>(<console>:46)
>       at $line50594476574342420814.$read$.<clinit>(<console>)
>       at $line50594476574342420814.$eval$.$print$lzycompute(<console>:7)
>       at $line50594476574342420814.$eval$.$print(<console>:6)
>       at $line50594476574342420814.$eval.$print(<console>)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
>       at 
> scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
>       at 
> scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
>       at 
> scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
>       at 
> scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
>       at 
> scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
>       at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
>       at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
>       at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
>       at 
> scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:894)
>       at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:762)
>       at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:464)
>       at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:485)
>       at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239)
>       at org.apache.spark.repl.Main$.doMain(Main.scala:78)
>       at org.apache.spark.repl.Main$.main(Main.scala:58)
>       at org.apache.spark.repl.Main.main(Main.scala)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>       at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
>       at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>       at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>       at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
>       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
>       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: ERROR 42601: In an ALTER TABLE statement, the column 
> 'IS_REWRITE_ENABLED' has been specified as NOT NULL and either the DEFAULT 
> clause was not specified or was specified as DEFAULT NULL.
>       at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>       at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.sql.compile.ColumnDefinitionNode.bindAndValidateDefault(Unknown
>  Source)
>       at org.apache.derby.impl.sql.compile.TableElementList.validate(Unknown 
> Source)
>       at 
> org.apache.derby.impl.sql.compile.AlterTableNode.bindStatement(Unknown Source)
>       at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
>       at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
>       at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
>       ... 157 more
> ...
> 20/03/09 13:57:05 ERROR ObjectStore: Version information found in metastore 
> differs 1.2.0 from expected schema version 2.3.0. Schema verififcation is 
> disabled hive.metastore.schema.verification
> 20/03/09 13:57:05 WARN ObjectStore: setMetaStoreSchemaVersion called but 
> recording version is disabled: version = 2.3.0, comment = Set by MetaStore 
> [email protected]
> {code}
> It would be great if there is a migration script to upgrade the metastore_db 
> from the older version to new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to