On 5/4/2014 2:18 PM, Graham Leggett wrote:


Nothing in the above seems to indicate an error that I can see, but we now see 
this two seconds later:

[04/May/2014:23:03:38 +0200] - ERROR bulk import abandoned
[04/May/2014:23:03:38 +0200] - import userRoot: Aborting all Import threads...
[04/May/2014:23:03:43 +0200] - import userRoot: Import threads aborted.
[04/May/2014:23:03:43 +0200] - import userRoot: Closing files...
[04/May/2014:23:03:43 +0200] - libdb: userRoot/uid.db4: unable to flush: No 
such file or directory
This indicates some sort of deep badness.
It appears that despite the initial sync as having failed, we ignore the above 
error and pretend all is ok, I suspect this is why we're getting the weird 
messages below.

Yes, the prime error seems to be the database file error above. Once you have that, all bets are off.

So..hmm... "no such file" ENOENT is very very odd. Is there anything peculiar about the filesystem this server is using ? Anything funky with permissions (although you'd expect an EPERM in that case) ?

The file (uid.db4 et al) would be opened previously (or should have been). It is perplexing as to why the error would show up on the fsync(). How does a file exist one second, then not the next? I'm guessing that the error code has been mangled, or means something different than might be deduced from the text.

It might be worth using the "strace" command with -p <pid of ns-slapd>, starting it prior to the replica init operation, and see what kernel calls the process is making. Also try turning up the logging "to 11" (not actually 11... but Spinal Tap - style -- I think it is 65535 to get all logging output).

You could also try an "import" of some LDIF data into that same server. It will exercise the same code as far as opening and writing to the database files. It would be interesting to see whether that throws the same ENOENT error, or not.


--
389 users mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/389-users

Reply via email to