ccoffline commented on a change in pull request #6416:
URL: https://github.com/apache/incubator-doris/pull/6416#discussion_r695365237
##########
File path:
fe/fe-core/src/main/java/org/apache/doris/alter/MaterializedViewHandler.java
##########
@@ -1165,15 +1152,15 @@ private void getOldAlterJobInfos(Database db,
List<List<Comparable>> rollupJobIn
for (AlterJob selectedJob : jobs) {
- OlapTable olapTable = (OlapTable)
db.getTable(selectedJob.getTableId());
- if (olapTable == null) {
- continue;
- }
- olapTable.readLock();
try {
- selectedJob.getJobInfo(rollupJobInfos, olapTable);
- } finally {
- olapTable.readUnlock();
+ OlapTable olapTable =
db.getTableOrMetaException(selectedJob.getTableId(), Table.TableType.OLAP);
+ olapTable.readLock();
+ try {
+ selectedJob.getJobInfo(rollupJobInfos, olapTable);
+ } finally {
+ olapTable.readUnlock();
+ }
+ } catch (MetaNotFoundException ignored) {
Review comment:
This is equals to `continue`
##########
File path:
fe/fe-core/src/main/java/org/apache/doris/alter/MaterializedViewHandler.java
##########
@@ -883,26 +878,20 @@ private void removeJobFromRunningQueue(AlterJobV2
alterJob) {
}
private void changeTableStatus(long dbId, long tableId, OlapTableState
olapTableState) {
- Database db = Catalog.getCurrentCatalog().getDb(dbId);
- if (db == null) {
- LOG.warn("db {} has been dropped when changing table {} status
after rollup job done",
- dbId, tableId);
- return;
- }
- OlapTable tbl = (OlapTable) db.getTable(tableId);
- if (tbl == null) {
- LOG.warn("table {} has been dropped when changing table status
after rollup job done",
- tableId);
- return;
- }
- tbl.writeLock();
try {
- if (tbl.getState() == olapTableState) {
- return;
+ Database db =
Catalog.getCurrentCatalog().getDbOrMetaException(dbId);
+ OlapTable olapTable = db.getTableOrMetaException(tableId,
Table.TableType.OLAP);
+ olapTable.writeLock();
+ try {
+ if (olapTable.getState() == olapTableState) {
+ return;
+ }
+ olapTable.setState(olapTableState);
+ } finally {
+ olapTable.writeUnlock();
}
- tbl.setState(olapTableState);
- } finally {
- tbl.writeUnlock();
+ } catch (MetaNotFoundException e) {
+ LOG.warn("[INCONSISTENT META] changing table status failed after
rollup job done", e);
Review comment:
This is called by `replayAlterJobV2()` and `onJobDone()`, which should
both not throw this exception.
##########
File path: fe/fe-core/src/main/java/org/apache/doris/alter/Alter.java
##########
@@ -260,17 +251,11 @@ private void processModifyColumnComment(Database db,
OlapTable tbl, List<AlterCl
}
}
- public void replayModifyComment(ModifyCommentOperationLog operation) {
+ public void replayModifyComment(ModifyCommentOperationLog operation)
throws MetaNotFoundException {
Review comment:
It is considerable. But "replay" throwing any exception is an extremely
big risk, which will cause all FE crush and cannot recover. These
`MetaNotFoundException` are mostly thrown by getDb and getTable, due to the
lock inconsistence that makes editlogs out of order. Semantically, throwing
this exception means some metadata the editlog want to edit on is missing
during replay, which makes this replay unable to continue. Just like
getDb/Table, This kind of inconsistence cannot recover anyway, and these
editlogs are theoretically only affect lost metadata.
I prefer to use `MetaNotFoundException` that indicate metadata has lost
rather than other exception which may cause confuse. The inconsistence can be
checked from warning log.
##########
File path: fe/fe-core/src/main/java/org/apache/doris/alter/RollupJobV2.java
##########
@@ -667,22 +638,26 @@ private void replayCancelled(RollupJobV2 replayedJob) {
@Override
public void replay(AlterJobV2 replayedJob) {
- RollupJobV2 replayedRollupJob = (RollupJobV2) replayedJob;
- switch (replayedJob.jobState) {
- case PENDING:
- replayCreateJob(replayedRollupJob);
- break;
- case WAITING_TXN:
- replayPendingJob(replayedRollupJob);
- break;
- case FINISHED:
- replayRunningJob(replayedRollupJob);
- break;
- case CANCELLED:
- replayCancelled(replayedRollupJob);
- break;
- default:
- break;
+ try {
+ RollupJobV2 replayedRollupJob = (RollupJobV2) replayedJob;
+ switch (replayedJob.jobState) {
+ case PENDING:
+ replayCreateJob(replayedRollupJob);
+ break;
+ case WAITING_TXN:
+ replayPendingJob(replayedRollupJob);
+ break;
+ case FINISHED:
+ replayRunningJob(replayedRollupJob);
+ break;
+ case CANCELLED:
+ replayCancelled(replayedRollupJob);
+ break;
+ default:
+ break;
+ }
+ } catch (MetaNotFoundException e) {
+ LOG.warn("[INCONSISTENT META] replay rollup job failed {}",
replayedJob.getJobId(), e);
Review comment:
It is called by `replayAlterJobV2`. I'm afraid it would be different
from the origin.
https://github.com/apache/incubator-doris/blob/138e7e896dee6842d8af7583d59288190371cd86/fe/fe-core/src/main/java/org/apache/doris/alter/AlterHandler.java#L460-L469
##########
File path: fe/fe-core/src/main/java/org/apache/doris/alter/AlterHandler.java
##########
@@ -302,40 +302,37 @@ protected void jobDone(AlterJob alterJob) {
}
}
- public void replayInitJob(AlterJob alterJob, Catalog catalog) {
- Database db = catalog.getDb(alterJob.getDbId());
+ public void replayInitJob(AlterJob alterJob, Catalog catalog) throws
MetaNotFoundException {
+ Database db = catalog.getDbOrMetaException(alterJob.getDbId());
alterJob.replayInitJob(db);
// add rollup job
addAlterJob(alterJob);
}
- public void replayFinishing(AlterJob alterJob, Catalog catalog) {
- Database db = catalog.getDb(alterJob.getDbId());
+ public void replayFinishing(AlterJob alterJob, Catalog catalog) throws
MetaNotFoundException {
+ Database db = catalog.getDbOrMetaException(alterJob.getDbId());
alterJob.replayFinishing(db);
alterJob.setState(JobState.FINISHING);
// !!! the alter job should add to the cache again, because the alter
job is deserialized from journal
// it is a different object compared to the cache
addAlterJob(alterJob);
}
- public void replayFinish(AlterJob alterJob, Catalog catalog) {
- Database db = catalog.getDb(alterJob.getDbId());
+ public void replayFinish(AlterJob alterJob, Catalog catalog) throws
MetaNotFoundException {
+ Database db = catalog.getDbOrMetaException(alterJob.getDbId());
alterJob.replayFinish(db);
alterJob.setState(JobState.FINISHED);
jobDone(alterJob);
}
- public void replayCancel(AlterJob alterJob, Catalog catalog) {
+ public void replayCancel(AlterJob alterJob, Catalog catalog) throws
MetaNotFoundException {
removeAlterJob(alterJob.getTableId());
alterJob.setState(JobState.CANCELLED);
- Database db = catalog.getDb(alterJob.getDbId());
- if (db != null) {
- // we log rollup job cancelled even if db is dropped.
- // so check db != null here
- alterJob.replayCancel(db);
- }
-
+ // we log rollup job cancelled even if db is dropped.
+ // so check db != null here
+ Database db = catalog.getDbOrMetaException(alterJob.getDbId());
+ alterJob.replayCancel(db);
Review comment:
I'll fix this.
##########
File path: fe/fe-core/src/main/java/org/apache/doris/alter/RollupJobV2.java
##########
@@ -812,10 +787,14 @@ public void gsonPostProcess() throws IOException {
return;
}
// parse the define stmt to schema
- SqlParser parser = new SqlParser(new SqlScanner(new
StringReader(origStmt.originStmt),
-
SqlModeHelper.MODE_DEFAULT));
+ SqlParser parser = new SqlParser(new SqlScanner(new
StringReader(origStmt.originStmt), SqlModeHelper.MODE_DEFAULT));
ConnectContext connectContext = new ConnectContext();
- Database db = Catalog.getCurrentCatalog().getDb(dbId);
+ Database db;
+ try {
+ db = Catalog.getCurrentCatalog().getDbOrMetaException(dbId);
+ } catch (MetaNotFoundException e) {
+ throw new IOException("error happens when parsing create
materialized view stmt: " + origStmt, e);
Review comment:
The origin code will throw NPE, I really don't know what to do here
##########
File path:
fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeHandler.java
##########
@@ -1889,17 +1889,14 @@ public void cancel(CancelStmt stmt) throws DdlException
{
Preconditions.checkState(!Strings.isNullOrEmpty(dbName));
Preconditions.checkState(!Strings.isNullOrEmpty(tableName));
- Database db = Catalog.getCurrentCatalog().getDb(dbName);
- if (db == null) {
- throw new DdlException("Database[" + dbName + "] does not exist");
- }
+ Database db = Catalog.getCurrentCatalog().getDbOrDdlException(dbName);
AlterJob schemaChangeJob = null;
AlterJobV2 schemaChangeJobV2 = null;
- OlapTable olapTable = null;
+ OlapTable olapTable;
try {
- olapTable = (OlapTable) db.getTableOrThrowException(tableName,
Table.TableType.OLAP);
+ olapTable = db.getTableOrMetaException(tableName,
Table.TableType.OLAP);
Review comment:
ok, that make sense
##########
File path:
fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeJobV2.java
##########
@@ -218,9 +215,9 @@ protected void runPendingJob() throws AlterCancelException {
}
MarkedCountDownLatch<Long, Long> countDownLatch = new
MarkedCountDownLatch<>(totalReplicaNum);
- OlapTable tbl = null;
+ OlapTable tbl;
try {
- tbl = (OlapTable) db.getTableOrThrowException(tableId,
TableType.OLAP);
+ tbl = db.getTableOrMetaException(tableId, TableType.OLAP);
Review comment:
ok
##########
File path:
fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeJobV2.java
##########
@@ -370,14 +367,11 @@ protected void runWaitingTxnJob() throws
AlterCancelException {
}
LOG.info("previous transactions are all finished, begin to send schema
change tasks. job: {}", jobId);
- Database db = Catalog.getCurrentCatalog().getDb(dbId);
- if (db == null) {
- throw new AlterCancelException("Databasee " + dbId + " does not
exist");
- }
+ Database db = Catalog.getCurrentCatalog().getDbOrException(dbId, s ->
new AlterCancelException("Database " + s + " does not exist"));
- OlapTable tbl = null;
+ OlapTable tbl;
try {
- tbl = (OlapTable) db.getTableOrThrowException(tableId,
TableType.OLAP);
+ tbl = db.getTableOrMetaException(tableId, TableType.OLAP);
Review comment:
ok
##########
File path: fe/fe-core/src/main/java/org/apache/doris/backup/RestoreJob.java
##########
@@ -1048,11 +1049,17 @@ private boolean downloadAndDeserializeMetaInfo() {
}
private void replayCheckAndPrepareMeta() {
- Database db = catalog.getDb(dbId);
+ Database db;
+ try {
+ db = catalog.getDbOrMetaException(dbId);
Review comment:
It is called by `replayRun`, which is called by the code below, and have
some next process.
https://github.com/apache/incubator-doris/blob/138e7e896dee6842d8af7583d59288190371cd86/fe/fe-core/src/main/java/org/apache/doris/backup/BackupHandler.java#L643-L655
##########
File path: fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java
##########
@@ -4548,72 +4476,43 @@ private void unprotectUpdateReplica(ReplicaPersistInfo
info) {
replica.setBad(false);
}
- public void replayAddReplica(ReplicaPersistInfo info) {
- Database db = getDb(info.getDbId());
- OlapTable olapTable = (OlapTable) db.getTable(info.getTableId());
- if (olapTable == null) {
- /**
- * Same as replayUpdateReplica()
- */
- LOG.warn("Olap table is null when the add replica log is replayed,
{}", info);
- return;
- }
+ public void replayAddReplica(ReplicaPersistInfo info) throws
MetaNotFoundException {
+ Database db = this.getDbOrMetaException(info.getDbId());
+ OlapTable olapTable = db.getTableOrMetaException(info.getTableId(),
TableType.OLAP);
olapTable.writeLock();
try {
- unprotectAddReplica(info);
+ unprotectAddReplica(olapTable, info);
} finally {
olapTable.writeUnlock();
}
}
- public void replayUpdateReplica(ReplicaPersistInfo info) {
- Database db = getDb(info.getDbId());
- OlapTable olapTable = (OlapTable) db.getTable(info.getTableId());
- if (olapTable == null) {
Review comment:
ok
##########
File path:
fe/fe-core/src/main/java/org/apache/doris/httpv2/rest/TableRowCountAction.java
##########
@@ -61,15 +61,12 @@ public Object count(
String fullDbName = getFullDbName(dbName);
// check privilege for select, otherwise return HTTP 401
checkTblAuth(ConnectContext.get().getCurrentUserIdentity(),
fullDbName, tblName, PrivPredicate.SELECT);
- Database db = Catalog.getCurrentCatalog().getDb(fullDbName);
- if (db == null) {
- return ResponseEntityBuilder.okWithCommonError("Database [" +
dbName + "] " + "does not exists");
- }
- OlapTable olapTable = null;
+ OlapTable olapTable;
try {
- olapTable = (OlapTable) db.getTableOrThrowException(tblName,
Table.TableType.OLAP);
+ Database db =
Catalog.getCurrentCatalog().getDbOrMetaException(fullDbName);
+ olapTable = db.getTableOrMetaException(tblName,
Table.TableType.OLAP);
} catch (MetaNotFoundException e) {
- return ResponseEntityBuilder.okWithCommonError(e.getMessage());
+ return ResponseEntityBuilder.notFound(e.getMessage());
Review comment:
ok
##########
File path: fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java
##########
@@ -6341,12 +6264,12 @@ public long loadCluster(DataInputStream dis, long
checksum) throws IOException,
// for adding BE to some Cluster, but loadCluster is after
loadBackend.
cluster.setBackendIdList(latestBackendIds);
- String dbName =
InfoSchemaDb.getFullInfoSchemaDbName(cluster.getName());
+ String dbName =
InfoSchemaDb.getFullInfoSchemaDbName(cluster.getName());
InfoSchemaDb db;
// Use real Catalog instance to avoid InfoSchemaDb id
continuously increment
// when checkpoint thread load image.
- if
(Catalog.getServingCatalog().getFullNameToDb().containsKey(dbName)) {
- db =
(InfoSchemaDb)Catalog.getServingCatalog().getFullNameToDb().get(dbName);
+ if
(Catalog.getCurrentCatalog().getFullNameToDb().containsKey(dbName)) {
Review comment:
It is a mischange, I'll fix this
##########
File path: fe/fe-core/src/main/java/org/apache/doris/persist/EditLog.java
##########
@@ -865,6 +862,8 @@ public static void loadJournal(Catalog catalog,
JournalEntity journal) {
throw e;
}
}
+ } catch (MetaNotFoundException e) {
+ LOG.warn("[INCONSISTENT META] replay failed {}", journal, e);
Review comment:
I considered printing the original journal content, but gave up because
of the complicated deserialization. `JournalEntity` has implemented
`toString()` method and has already print the op code. It's better for all the
`Writable` objects to implement `toString()` method.
I'll add `e.getMessage()` in one line and keep the exception stack trace.
##########
File path:
fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeJobV2.java
##########
@@ -218,9 +215,9 @@ protected void runPendingJob() throws AlterCancelException {
}
MarkedCountDownLatch<Long, Long> countDownLatch = new
MarkedCountDownLatch<>(totalReplicaNum);
- OlapTable tbl = null;
+ OlapTable tbl;
try {
- tbl = (OlapTable) db.getTableOrThrowException(tableId,
TableType.OLAP);
+ tbl = db.getTableOrMetaException(tableId, TableType.OLAP);
Review comment:
This checks if the table exists and if the table is OLAP, so it might be
easier to code this way.
##########
File path: fe/fe-core/src/main/java/org/apache/doris/alter/Alter.java
##########
@@ -260,17 +251,11 @@ private void processModifyColumnComment(Database db,
OlapTable tbl, List<AlterCl
}
}
- public void replayModifyComment(ModifyCommentOperationLog operation) {
+ public void replayModifyComment(ModifyCommentOperationLog operation)
throws MetaNotFoundException {
Review comment:
@caiconghui we don't have any promise that the edit logs are in order.
These check code is to prevent the worst from happening.
Edit logs that out of order may cause meta inconsistent, which has to be
fixes sooner or later. We are exploring ways to ensure consistency and minimize
the cost. Until then, we have to check all NPE or let all the FE to crash.
##########
File path: fe/fe-core/src/main/java/org/apache/doris/alter/Alter.java
##########
@@ -260,17 +251,11 @@ private void processModifyColumnComment(Database db,
OlapTable tbl, List<AlterCl
}
}
- public void replayModifyComment(ModifyCommentOperationLog operation) {
+ public void replayModifyComment(ModifyCommentOperationLog operation)
throws MetaNotFoundException {
Review comment:
@caiconghui we don't have any promise that the edit logs are in order.
These check code is to prevent the worst from happening.
Edit logs that out of order may cause meta inconsistent, which has to be
fixed sooner or later. We are exploring ways to ensure consistency and minimize
the cost. Until then, we have to check all NPE or let all the FE to crash.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]