Re: [systemd-devel] list-boots is incorrect, was: lost journal persistence

Lennart Poettering Thu, 15 May 2014 14:56:32 -0700

On Thu, 15.05.14 12:01, Chris Murphy (li...@colorremedies.com) wrote:

> 
> 
> On May 12, 2014, at 9:58 AM, Chris Murphy <li...@colorremedies.com> wrote:
> 
> > 
> > On May 12, 2014, at 7:06 AM, Kirill Elagin <kirela...@gmail.com> wrote:
> > 
> >> Could it be that all the boot ids are actually the same for some reason?
> >> I had this issue in a container when systemd was reading boot_id from 
> >> `/proc/sys/kernel/random/boot_id` and since /proc was bind-mounted, 
> >> boot_id always was host's boot_id.
> >> 
> >> You can also run `journalctl -F _BOOT_ID` to see a set of all the boot ids 
> >> recorded in the journal (this must agree with `journalctl --list-boots`.
> > 
> > 
> > # journalctl --list-boots | wc -l
> > 36
> > [root@rawhide ~]# journalctl -F _BOOT_ID | wc -l
> > 80
> > # cat /proc/sys/kernel/random/boot_id
> > 420fa190-e7dd-4cd7-b248-fd62417d7c02
> > # reboot
> > ###
> > # journalctl --list-boots | wc -l
> > 36
> > [root@rawhide ~]# journalctl -F _BOOT_ID | wc -l
> > 81
> > # cat /proc/sys/kernel/random/boot_id
> > 1e0d5346-85cb-477b-9ae2-2cfb53097b97
> > 
> > 
> > So there are more boot ID's than there are list-boots, and list-boots 
> > doesn't increment while boot ID's do. And neither of these boot id's match 
> > any of the boot id's in --list-boots.
> 
> Deleting the files in /var/log/journal/<machineid>/ fixes the problem on the 
> Fedora 20 and Fedora Rawhide VMs experiencing this problem. If the files are 
> restored, the problem returns. Yet --verify shows PASS for each log file. So 
> it seems there's some kind of corruption or confusion with these log files.
> 
> The btrfs filesystem is mounted with autodefrag option, for which there have 
> been some problems with kernel 3.12 and older. But the Rawhide system has 
> only been using 3.14 and 3.15 kernels so if there's still some problem I'm 
> not sure how to isolate it since journalctl --verify seems to think the files 
> are OK.
> 
> Despite autodefrag enabled, the systemjournal has ~1450 fragments according 
> to filefrag. If I boot from a rescue image so nothing is actively writing to 
> these journal files, and recopy them such that they are each 1 extent only, 
> then reboot the system, the list-boots changes. It actually goes down, from 
> 45 boots to 31 boots, and stays stuck with additional reboots. Whereas 
> BOOT_ID continues to increment as before. So just by copying the log files I 
> get different --list-boot behavior. That's pretty strange.
> 
> Yes, of course also btrfs check and scrub find no problems with either Fedora 
> 20 or Fedora Rawhide Btrfs file systems this problem is happening on.


This is certainly weird... Copying these files to ext4, does that change
anything?

I wonder if this might have to do something with time changes? Maybe the
clock was not set correctly?

Maybe it is a perms issue with some files not accessible to some users?

Lennart

-- 
Lennart Poettering, Red Hat
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] list-boots is incorrect, was: lost journal persistence

Reply via email to