To be more precise what I saw, was that the checkpoint file was actually there but 0 length, if memory serves (and hence corrupt).
On Wed, Nov 29, 2017 at 2:11 PM, abdullah alamoudi <[email protected]> wrote: > I wonder how it got to that state. > > The first thing an instance does after initialization is create the snapshot > file. > This will only be deleted after a new (uncorrupted) snapshot file is created. > > I understand your point, but I wonder how it got to this state. Bug!? > > Cheers, > Abdullah. > >> On Nov 29, 2017, at 1:54 PM, Chen Luo <[email protected]> wrote: >> >> Hi devs, >> >> Recently I was experiencing a very annoying issue about recovery. The >> checkpoint file of my dataset was somehow corrupted (and I didn't know >> why). However, when I was restarting AsterixDB, it fails to read the >> checkpoint file, and starts recovering as a clean state. This is highly >> undesirable in the sense that it clean up all of my experiment datasets >> saliently, roughly 100GB. And it'll take me days to re-ingest these data to >> resume my experiments. >> >> I think the behavior of cleaning up all data when some small thing goes >> wrong is undesirable and dangerous. When AsterixDB fails to restart, and >> finds the data directory non-empty, I think it should notify the user and >> let the user to make the decision. For example, it could fail to restart at >> this time, and user could clean up the directory manually, or try to use a >> backup checkpoint file, or add some flag to force restart. Anyway, blindly >> cleaning up all files seem to be a dangerous solution. >> >> Any thoughts on this? >> >> Best regards, >> Chen Luo >
