[lucy-user] Re: Couldn't completely remove 'seg_N'

goran kent Wed, 16 Nov 2011 00:06:36 -0800

On Wed, Nov 16, 2011 at 9:28 AM, goran kent <[email protected]> wrote:
> Is that stale segment folder from a previous unrelated index/merge
> session, or is it from the current session which has crashed/failed
> and this is part of the cleanup procedure?  It seems to be the former,
> am I right?  The "_prep_" in SegWriter_prep_seg_dir() seems to imply
> this is a brand new session trying to create the seg_N folder, which
> throws an exception since the folder already exists.
>
> I'll start some debugging sometime today to try and track down where
> the hell that crash is happening, but I just wanted to clarify my
> understanding of the code.
>
> btw, if seg_N is empty, why is Folder_Delete_Tree() failing to trash
> it?  Maybe because the stale write.lock is still soiling the
> situation?  (grep -rl '^Folder_Delete_Tree' * failed to find anything,
> so I couldn't have a quick look to confirm that idea)


Looks like the lock file is for the current session (the PID therein
and the timestamp all match up), and not for a previous unrelated
crashed session.

So, it locks the index successfully, does something, then tries to remove seg_4:

drwxr-xr-x 2 root root 4.0K Nov 16 01:30 seg_4
drwxr-xr-x 2 root root 4.0K Nov 16 01:30 locks
-rw-r--r-- 1 root root  119 Nov  7 10:07 snapshot_3.json
-rw-r--r-- 1 root root  13K Nov  7 10:07 schema_3.json
drwxr-xr-x 2 root root 4.0K Nov  7 10:07 seg_3
drwxr-xr-x 2 root root 4.0K Nov  7 09:48 seg_2
drwxr-xr-x 2 root root 4.0K Nov  6 22:42 seg_1

-rw-r--r-- 1 root root 54 Nov 16 01:30 write.lock
cat:
{
  "host": "",
  "name": "write",
  "pid": "26271"
}

This happened during an automated run.  When I simulated the run today
manually, it succeeded (ie, seg_4 was ignored, seg_5 was created, the
lockfile was purged, etc).

I'm trying to get my head around what could be going wrong here so I
can automate self-healing or better handle this scenario.

[lucy-user] Re: Couldn't completely remove 'seg_N'

Reply via email to