Hi, It could be that there is a looming bug here. Can you clarify what "new metadata will be lost" means? I suspect that in most cases you can recover by running either refresh (if only files were added) or recover partitions (if a new partition was dynamically created).
Dimitris On Tue, Jun 6, 2017 at 5:04 AM, yu feng <[email protected]> wrote: > Hi impala community: > > I having been using impala in our env. Here is our cluster deployment: > 20+ impalad backend. > 4 of all impalads act as coordinator. > one catalogd and one statestored > > > I encounter one problem that one impalad's metadata is out of sync after > catalogd restart.I find that while catalogd restarting, a DML operation is > executing. > After I analyze impala source code, I reappear the problem. this is my > steps and analysis: > > 1. Start the impala cluster. > 2. The cluster run a long time, lots of metadata operations, and current > catalogVersion_ is big(such as bigger than 10000) > 3. Submit a DML query(such as 'insert into xx partition() select xxx') to > one impalad, and the query run about 1m. > 4. While the query running, I stop catalogd, and I start catalogd just > before the query execute QueryExecState->UpdateCatalog(). > 5. UpdateCatalog() will request catalogd for UpdateCatalog and catalogd > will update the metadata of the table and response the newest metadata of > the table. > 6. After catalogd response, UpdateCatalog() update metadata cached in > impalad(call updateCatalogCache()), and the run the following code: > > if (!catalogServiceId_.equals(req.getCatalog_service_id())) { > boolean firstRun = > catalogServiceId_.equals(INITIAL_CATALOG_SERVICE_ID); > catalogServiceId_ = req.getCatalog_service_id(); > if (!firstRun) { > // Throw an exception which will trigger a full topic update > request. > throw new CatalogException("Detected catalog service ID change. > Aborting " + > "updateCatalog()"); > } > } > > serviceId is the new started catalogd's serviceId and do not equals to the > impalad's catalogServiceId_, so the function throw CatalogException and the > query get EXCEPTION, what is more, the impalad's catalogServiceId_ is set > to the new one. > > 7. After catalogd start successfully, and publish all metadata to > statestored, then push to the impalad, After step 6, impalad's > catalogServiceId_ equals to the catalogd's serviceId, no exception throws. > > 8. In normal steps, step 7 will throw the CatalogException and set the > from_version to 0 and statestored send full metadatas to impalad in next > UpdateState(). > > 9. After all steps finish, the impalad is out of sync, all new metadata > operation will be lost because CatalogObjectCache.add() need 'new item will > only be added if it has a larger catalog version'. > > Please help to confirm whether it is correct. If not, Is there any other > possibility of the problem? If so, maybe it is a bug or do you have some > suggestions to avoiding the problem. > > Thanks a lot. >
