I wonder how it got to that state. The first thing an instance does after initialization is create the snapshot file. This will only be deleted after a new (uncorrupted) snapshot file is created.
I understand your point, but I wonder how it got to this state. Bug!? Cheers, Abdullah. > On Nov 29, 2017, at 1:54 PM, Chen Luo <[email protected]> wrote: > > Hi devs, > > Recently I was experiencing a very annoying issue about recovery. The > checkpoint file of my dataset was somehow corrupted (and I didn't know > why). However, when I was restarting AsterixDB, it fails to read the > checkpoint file, and starts recovering as a clean state. This is highly > undesirable in the sense that it clean up all of my experiment datasets > saliently, roughly 100GB. And it'll take me days to re-ingest these data to > resume my experiments. > > I think the behavior of cleaning up all data when some small thing goes > wrong is undesirable and dangerous. When AsterixDB fails to restart, and > finds the data directory non-empty, I think it should notify the user and > let the user to make the decision. For example, it could fail to restart at > this time, and user could clean up the directory manually, or try to use a > backup checkpoint file, or add some flag to force restart. Anyway, blindly > cleaning up all files seem to be a dangerous solution. > > Any thoughts on this? > > Best regards, > Chen Luo
