[ 
https://issues.apache.org/jira/browse/HBASE-10636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-10636.
-----------------------------------------
      Assignee:     (was: Honghua Feng)
    Resolution: Abandoned

> HBaseAdmin.deleteTable isn't 'really' synchronous in that still some cleanup 
> in HMaster after client thinks deleteTable() succeeds
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10636
>                 URL: https://issues.apache.org/jira/browse/HBASE-10636
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Client, master
>            Reporter: Honghua Feng
>            Priority: Major
>
> In HBaseAdmin.deleteTable():
> {code}
> public void deleteTable(final TableName tableName) throws IOException {
>     // Wait until all regions deleted
>     for (int tries = 0; tries < (this.numRetries * 
> this.retryLongerMultiplier); tries++) {
>         // let us wait until hbase:meta table is updated and
>         // HMaster removes the table from its HTableDescriptors
>         if (values == null || values.length == 0) {
>           tableExists = false;
>           GetTableDescriptorsResponse htds;
>           MasterKeepAliveConnection master = 
> connection.getKeepAliveMasterService();
>           try {
>             GetTableDescriptorsRequest req =
>               RequestConverter.buildGetTableDescriptorsRequest(tableName);
>             htds = master.getTableDescriptors(null, req);
>           } catch (ServiceException se) {
>             throw ProtobufUtil.getRemoteException(se);
>           } finally {
>             master.close();
>           }
>           tableExists = !htds.getTableSchemaList().isEmpty();
>           if (!tableExists) {
>             break;
>           }
>         }
>       }
> {code}
> client thinks deleteTable succeeds once it can't retrieve back the 
> tableDescriptor
> But in HMaster, the DeleteTableHandler which really deletes the table:
> {code}
>   protected void handleTableOperation(List<HRegionInfo> regions)
>   throws IOException, KeeperException {
>     // 1. Wait because of region in transition
>     ....
>     // 2. Remove regions from META
>     LOG.debug("Deleting regions from META");
>     MetaEditor.deleteRegions(this.server.getCatalogTracker(), regions);
>     // 3. Move the table in /hbase/.tmp
>     MasterFileSystem mfs = this.masterServices.getMasterFileSystem();
>     Path tempTableDir = mfs.moveTableToTemp(tableName);
>     try {
>       // 4. Delete regions from FS (temp directory)
>       FileSystem fs = mfs.getFileSystem();
>       for (HRegionInfo hri: regions) {
>         LOG.debug("Archiving region " + hri.getRegionNameAsString() + " from 
> FS");
>         HFileArchiver.archiveRegion(fs, mfs.getRootDir(),
>             tempTableDir, new Path(tempTableDir, hri.getEncodedName()));
>       }
>       // 5. Delete table from FS (temp directory)
>       if (!fs.delete(tempTableDir, true)) {
>         LOG.error("Couldn't delete " + tempTableDir);
>       }
>       LOG.debug("Table '" + tableName + "' archived!");
>     } finally {
>       // 6. Update table descriptor cache
>       LOG.debug("Removing '" + tableName + "' descriptor.");
>       this.masterServices.getTableDescriptors().remove(tableName);
>       // 7. Clean up regions of the table in RegionStates.
>       LOG.debug("Removing '" + tableName + "' from region states.");
>       states.tableDeleted(tableName);
>       // 8. If entry for this table in zk, and up in AssignmentManager, 
> remove it.
>       LOG.debug("Marking '" + tableName + "' as deleted.");
>       am.getZKTable().setDeletedTable(tableName);
>     }
>     if (cpHost != null) {
>       cpHost.postDeleteTableHandler(this.tableName);
>     }
>   }
> {code}
> Removing regions out of RegionStates, Marking table deleted from ZK, Calling 
> coprocessor's postDeleteTableHandler are all after the table is removed from 
> TableDescriptor cache
> So client code relying on RegionStates/ZKTable/CP being cleaned up after 
> deleteTable() possibly fail, if client requests hit HMaster before those 
> three cleanup are done...
> Actually when I add some sleep such as 200ms after below line to simulate a 
> possible slow-running HMaster
> {code}
> this.masterServices.getTableDescriptors().remove(tableName);
> {code}
> Some unit tests(such as moveRegion / confirming postDeleteTable CP 
> immediately after deleteTable) can't pass no longer



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to