[ 
https://issues.apache.org/jira/browse/SPARK-31099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056465#comment-17056465
 ] 

Jungtaek Lim commented on SPARK-31099:
--------------------------------------

[~dongjoon]

Could you elaborate your comment "Apache Spark 3.0 also doesn't support 
restarting from the old streaming checkpoint."? For sure Spark 3.0 should 
support the old checkpoint, except some cases which we have to discard old 
checkpoint to fix correctness issues.

> Create migration script for metastore_db
> ----------------------------------------
>
>                 Key: SPARK-31099
>                 URL: https://issues.apache.org/jira/browse/SPARK-31099
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Gengliang Wang
>            Priority: Major
>
> When an existing Derby database exists (in ./metastore_db) created by Hive 
> 1.2.x profile, it'll fail to upgrade itself to the Hive 2.3.x profile.
> Repro steps:
> 1. Build OSS or DBR master with SBT with -Phive-1.2 -Phive 
> -Phive-thriftserver. Make sure there's no existing ./metastore_db directory 
> in the repo.
> 2. Run bin/spark-shell, and then spark.sql("show databases"). This will 
> populate the ./metastore_db directory, where the Derby-based metastore 
> database is hosted. This database is populated from Hive 1.2.x.
> 3. Re-build OSS or DBR master with SBT with -Phive -Phive-thriftserver (drops 
> the Hive 1.2 profile, which makes it use the default Hive 2.3 profile)
> 4. Repeat Step (2) above. This will trigger Hive 2.3.x to load the Derby 
> database created in Step (2), which triggers an upgrade step, and that's 
> where the following error will be reported.
> 5. Delete the ./metastore_db and re-run Step (4). The error is no longer 
> reported.
> {code:java}
> 20/03/09 13:57:04 ERROR Datastore: Error thrown executing ALTER TABLE TBLS 
> ADD IS_REWRITE_ENABLED CHAR(1) NOT NULL CHECK (IS_REWRITE_ENABLED IN 
> ('Y','N')) : In an ALTER TABLE statement, the column 'IS_REWRITE_ENABLED' has 
> been specified as NOT NULL and either the DEFAULT clause was not specified or 
> was specified as DEFAULT NULL.
> java.sql.SQLSyntaxErrorException: In an ALTER TABLE statement, the column 
> 'IS_REWRITE_ENABLED' has been specified as NOT NULL and either the DEFAULT 
> clause was not specified or was specified as DEFAULT NULL.
>       at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>       at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>       at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:879)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:830)
>       at 
> org.datanucleus.store.rdbms.table.TableImpl.validateColumns(TableImpl.java:257)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3398)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896)
>       at 
> org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672)
>       at 
> org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:425)
>       at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:865)
>       at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:347)
>       at org.datanucleus.store.query.Query.executeQuery(Query.java:1816)
>       at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744)
>       at org.datanucleus.store.query.Query.execute(Query.java:1726)
>       at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:374)
>       at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216)
>       at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:184)
>       at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.<init>(MetaStoreDirectSql.java:144)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:410)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303)
>       at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
>       at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
>       at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
>       at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:164)
>       at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3600)
>       at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3652)
>       at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3632)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3894)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
>       at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:388)
>       at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332)
>       at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312)
>       at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.client(HiveClientImpl.scala:343)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:369)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$retryLocked$1(HiveClientImpl.scala:280)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:316)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:272)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:359)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:472)
>       at 
> org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$databaseExists$1(PoolingHiveClient.scala:267)
>       at 
> org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$databaseExists$1$adapted(PoolingHiveClient.scala:266)
>       at 
> org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:112)
>       at 
> org.apache.spark.sql.hive.client.PoolingHiveClient.databaseExists(PoolingHiveClient.scala:266)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:286)
>       at 
> scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:145)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:106)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:144)
>       at 
> com.databricks.spark.util.NoopProgressReporter$.withStatusCode(ProgressReporter.scala:52)
>       at 
> com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:143)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:286)
>       at 
> org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:212)
>       at 
> org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:199)
>       at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:47)
>       at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:62)
>       at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:94)
>       at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:94)
>       at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:270)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:191)
>       at 
> org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:43)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45)
>       at 
> org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:231)
>       at 
> org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3612)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:115)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:246)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:100)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:76)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:196)
>       at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3610)
>       at org.apache.spark.sql.Dataset.<init>(Dataset.scala:231)
>       at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
>       at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
>       at 
> org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:662)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
>       at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:657)
>       at 
> $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:24)
>       at 
> $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:28)
>       at 
> $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30)
>       at 
> $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:32)
>       at $line50594476574342420814.$read$$iw$$iw$$iw$$iw.<init>(<console>:34)
>       at $line50594476574342420814.$read$$iw$$iw$$iw.<init>(<console>:36)
>       at $line50594476574342420814.$read$$iw$$iw.<init>(<console>:38)
>       at $line50594476574342420814.$read$$iw.<init>(<console>:40)
>       at $line50594476574342420814.$read.<init>(<console>:42)
>       at $line50594476574342420814.$read$.<init>(<console>:46)
>       at $line50594476574342420814.$read$.<clinit>(<console>)
>       at $line50594476574342420814.$eval$.$print$lzycompute(<console>:7)
>       at $line50594476574342420814.$eval$.$print(<console>:6)
>       at $line50594476574342420814.$eval.$print(<console>)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
>       at 
> scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
>       at 
> scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
>       at 
> scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
>       at 
> scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
>       at 
> scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
>       at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
>       at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
>       at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
>       at 
> scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:894)
>       at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:762)
>       at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:464)
>       at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:485)
>       at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239)
>       at org.apache.spark.repl.Main$.doMain(Main.scala:78)
>       at org.apache.spark.repl.Main$.main(Main.scala:58)
>       at org.apache.spark.repl.Main.main(Main.scala)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>       at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
>       at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>       at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>       at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
>       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
>       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: ERROR 42601: In an ALTER TABLE statement, the column 
> 'IS_REWRITE_ENABLED' has been specified as NOT NULL and either the DEFAULT 
> clause was not specified or was specified as DEFAULT NULL.
>       at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>       at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.sql.compile.ColumnDefinitionNode.bindAndValidateDefault(Unknown
>  Source)
>       at org.apache.derby.impl.sql.compile.TableElementList.validate(Unknown 
> Source)
>       at 
> org.apache.derby.impl.sql.compile.AlterTableNode.bindStatement(Unknown Source)
>       at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
>       at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
>       at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
>       ... 157 more
> ...
> 20/03/09 13:57:05 ERROR ObjectStore: Version information found in metastore 
> differs 1.2.0 from expected schema version 2.3.0. Schema verififcation is 
> disabled hive.metastore.schema.verification
> 20/03/09 13:57:05 WARN ObjectStore: setMetaStoreSchemaVersion called but 
> recording version is disabled: version = 2.3.0, comment = Set by MetaStore 
> [email protected]
> {code}
> It would be great if there is a migration script to upgrade the metastore_db 
> from the older version to new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to