Re: Metadata in cluster is out of sync

yu feng Tue, 06 Jun 2017 19:16:26 -0700

I said "new metadata will be lost"  means the following metadata operation
which happened to the existing table will be lost until the table's version
catch up with the older version. I think any operation can not recover it
because impalad update local cached metadata by comparing new version and
older version.


I try some operations which can trigger table's metadata reloading and new
version generated such as refresh, alter table. But new metadata always
lost until the catalog_version bigger than the older one, for new created
catalog object(such as create table/ create database..), the metadata
is up-to-date.

I think it is a bug, we need keep older catalogServiceId_ until full newly
metadata applied（non-delta one, pushed by statestored）， even all
of metadata operations will be EXCEPTION at this time gap. perhaps there
are some better solutions.

Thanks a lot.

2017-06-07 0:58 GMT+08:00 Dimitris Tsirogiannis <[email protected]>
:

> Hi,
>
> It could be that there is a looming bug here. Can you clarify what "new
> metadata will be lost" means? I suspect that in most cases you can recover
> by running either refresh (if only files were added) or recover partitions
> (if a new partition was dynamically created).
>
> Dimitris
>
> On Tue, Jun 6, 2017 at 5:04 AM, yu feng <[email protected]> wrote:
>
> > Hi impala community:
> >
> > I having been using impala in our env. Here is our cluster deployment:
> > 20+ impalad backend.
> > 4 of all impalads act as coordinator.
> > one catalogd and one statestored
> >
> >
> > I encounter one problem that one impalad's metadata is out of sync after
> > catalogd restart.I find that while catalogd restarting, a DML operation
> is
> > executing.
> > After I analyze impala source code, I reappear the problem. this is my
> > steps and analysis:
> >
> > 1. Start the impala cluster.
> > 2. The cluster run a long time, lots of metadata operations, and current
> > catalogVersion_ is big(such as bigger than 10000)
> > 3. Submit a DML query(such as 'insert into xx partition() select xxx') to
> > one impalad, and the query run about 1m.
> > 4. While the query running, I stop catalogd, and I start catalogd just
> > before the query execute QueryExecState->UpdateCatalog().
> > 5. UpdateCatalog() will request catalogd for UpdateCatalog and catalogd
> > will update the metadata of the table and response the newest metadata of
> > the table.
> > 6. After catalogd response, UpdateCatalog() update metadata cached in
> > impalad(call updateCatalogCache()), and the run the following code:
> >
> >      if (!catalogServiceId_.equals(req.getCatalog_service_id())) {
> >       boolean firstRun =
> > catalogServiceId_.equals(INITIAL_CATALOG_SERVICE_ID);
> >       catalogServiceId_ = req.getCatalog_service_id();
> >       if (!firstRun) {
> >         // Throw an exception which will trigger a full topic update
> > request.
> >         throw new CatalogException("Detected catalog service ID change.
> > Aborting " +
> >             "updateCatalog()");
> >       }
> >     }
> >
> > serviceId is the new started catalogd's serviceId and do not equals to
> the
> > impalad's catalogServiceId_, so the function throw CatalogException and
> the
> > query get EXCEPTION, what is more, the impalad's catalogServiceId_ is set
> > to the new one.
> >
> > 7. After catalogd start successfully, and publish all metadata to
> > statestored, then push to the impalad, After step 6, impalad's
> > catalogServiceId_ equals to the catalogd's serviceId, no exception
> throws.
> >
> > 8. In normal steps, step 7 will throw the CatalogException and set the
> > from_version to 0 and statestored send full metadatas to impalad in next
> > UpdateState().
> >
> > 9. After all steps finish, the impalad is out of sync, all new metadata
> > operation will be lost because CatalogObjectCache.add() need 'new item
> will
> > only be added if it has a larger catalog version'.
> >
> > Please help to confirm whether it is correct. If not, Is there any other
> > possibility of the problem? If so, maybe it is a bug or do you have some
> > suggestions to avoiding the problem.
> >
> > Thanks a lot.
> >
>

Re: Metadata in cluster is out of sync

Reply via email to