On 2017-04-28 19:41, Chris Murphy wrote: > On Fri, Apr 28, 2017 at 11:05 AM, Goffredo Baroncelli > <kreij...@inwind.it> wrote: > >> In the past I faced the same problems; I collected some data here >> http://kreijack.blogspot.it/2014/06/btrfs-and-systemd-journal.html. >> Unfortunately the journald files are very bad, because first the data is >> written (appended), then the index fields are updated. Unfortunately these >> indexes are near after the last write . So fragmentation is unavoidable. >> >> After some thinking I adopted a different strategies: I used journald as >> collector, then I forward all the log to rsyslogd, which used a "log append" >> format. Journald never write on the root filesystem, only in tmp. > > The gotcha though is there's a pile of data in the journal that would > never make it to rsyslogd. If you use journalctl -o verbose you can > see some of this.
You can send *all the info* to rsyslogd via imjournal http://www.rsyslog.com/doc/v8-stable/configuration/modules/imjournal.html In my setup all the data are stored in json format in the /var/log/cee.log file: $ head /var/log/cee.log 2017-04-28T18:41:41.931273+02:00 venice liblogging-stdlog: @cee: { "PRIORITY": "6", "_BOOT_ID": "a86d74bab91f44dc974c76aceb97141f", "_MACHINE_ID": "e84907d099904117b355a99c98378dca", "_HOSTNAME": "venice.bhome", "_SYSTEMD_SLICE": "system.slice", "_UID": "0", "_GID": "0", "_CAP_EFFECTIVE": "3fffffffff", "_TRANSPORT": "syslog", "SYSLOG_FACILITY": "23", "SYSLOG_IDENTIFIER": "liblogging-stdlog", "MESSAGE": " [origin software=\"rsyslogd\" swVersion=\"8.24.0\" x-pid=\"737\" x-info=\"http:\/\/www.rsyslog.com\"] rsyslogd was HUPed", "_PID": "737", "_COMM": "rsyslogd", "_EXE": "\/usr\/sbin\/rsyslogd", "_CMDLINE": "\/usr\/sbin\/rsyslogd -n", "_SYSTEMD_CGROUP": "\/system.slice\/rsyslog.service", "_SYSTEMD_UNIT": "rsyslog.service", "_SYSTEMD_INVOCATION_ID": "18b9a8b27f9143728adef972db7b394c", "_SOURCE_REALTIME_TIMESTAMP": "1493397701931255", "msg": "[origin software=\"rsyslogd\" swVersion=\"8.24.0\" x-pid=\"737\" x-info=\"http:\/\/www.rsyslog.com\"] rsyslogd was HUPed" } 2017-04-28T18:41:42.058549+02:00 venice liblogging-stdlog: @cee: { "PRIORITY": "6", "_BOOT_ID": "a86d74bab91f44dc974c76aceb97141f", "_MACHINE_ID": "e84907d099904117b355a99c98378dca", "_HOSTNAME": "venice.bhome", "_SYSTEMD_SLICE": "system.slice", "_UID": "0", "_GID": "0", "_CAP_EFFECTIVE": "3fffffffff", "_TRANSPORT": "syslog", "SYSLOG_FACILITY": "23", "SYSLOG_IDENTIFIER": "liblogging-stdlog", "MESSAGE": " [origin software=\"rsyslogd\" swVersion=\"8.24.0\" x-pid=\"737\" x-info=\"http:\/\/www.rsyslog.com\"] rsyslogd was HUPed", "_PID": "737", "_COMM": "rsyslogd", "_EXE": "\/usr\/sbin\/rsyslogd", "_CMDLINE": "\/usr\/sbin\/rsyslogd -n", "_SYSTEMD_CGROUP": "\/system.slice\/rsyslog.service", "_SYSTEMD_UNIT": "rsyslog.service", "_SYSTEMD_INVOCATION_ID": "18b9a8b27f9143728adef972db7b394c", "_SOURCE_REALTIME_TIMESTAMP": "1493397702058441", "msg": "[origin software=\"rsyslogd\" swVersion=\"8.24.0\" x-pid=\"737\" x-info=\"http:\/\/www.rsyslog.com\"] rsyslogd was HUPed" } [....] All the info are stored with the same keys/values as journald does. I developed an utility (called clp), which allow to query the log by key, filtering by boot nr, by date.... For example to show all the log related to rsyslog $ clp log -t full-details _SYSTEMD_CGROUP=/system.slice/rsyslog.service 2017-04-21 19:12:29.579748 MESSAGE= [origin software="rsyslogd" swVersion="8.24.0" x-pid="804" x-info="http://www.rsyslog.com"] rsyslogd was HUPed PRIORITY=6 SYSLOG_FACILITY=23 SYSLOG_IDENTIFIER=liblogging-stdlog _BOOT_ID=d77198380c9344248e01166fbd8d60df _CAP_EFFECTIVE=3fffffffff _CMDLINE=/usr/sbin/rsyslogd -n _COMM=rsyslogd _EXE=/usr/sbin/rsyslogd _GID=0 _HOSTNAME=venice.bhome _LOGFILEINITLINE=2017-04-21T19:12:29.579768+02:00 venice liblogging-stdlog: _LOGFILELINENUMBER=1 _LOGFILENAME=/var/log/cee.log.7.gz _LOGFILETIMESTAMP=1492794749579768 _MACHINE_ID=e84907d099904117b355a99c98378dca _PID=804 _SOURCE_REALTIME_TIMESTAMP=1492794749579748 _SYSTEMD_CGROUP=/system.slice/rsyslog.service _SYSTEMD_INVOCATION_ID=8f9cb6c871be4158a3ccb374f4323027 _SYSTEMD_SLICE=system.slice _SYSTEMD_UNIT=rsyslog.service _TRANSPORT=syslog _UID=0 msg=[origin software="rsyslogd" swVersion="8.24.0" x-pid="804" x-info="http://www.rsyslog.com"] rsyslogd was HUPed 2017-04-21 19:12:29.669637 MESSAGE= [origin software="rsyslogd" swVersion="8.24.0" x-pid="804" x-info="http://www.rsyslog.com"] rsyslogd was HUPed PRIORITY=6 SYSLOG_FACILITY=23 SYSLOG_IDENTIFIER=liblogging-stdlog _BOOT_ID=d77198380c9344248e01166fbd8d60df _CAP_EFFECTIVE=3fffffffff _CMDLINE=/usr/sbin/r [...] > There's a bunch of extra metadata in the journal. > And then also filtering based on that metadata is useful rather than > being limited to grep on a syslog file. Which, you know, it's fine for > many use cases. With imjournal you don't loose any metadata or data. Unfortunately there no is an utility to query the log. In my case I developed a my own > I guess I'm just interested in whether there's an > enhancement that can be done to make journals more compatible with > Btrfs or vice versa. It's not a huge problem anyway. I am still inclined to think that a text file format "append only" would be better than the pseudo database which journald uses. However I have to collect some data to be sure that the perfomance would not be worse. To do that I need some megabytes of log, which I don't have because I don't use the journald file format. Someone would share with me its logs :-) ? > > >> >> The think became interesting when I discovered that the searching in a >> rsyslog file is faster than journalctl (on a rotational media). >> Unfortunately I don't have any data to support this. > > > Yes on drives all of these scattered extents cause a lot of head > seeking. And I also suspect it's a lot of metadata spread out > everywhere too, to account for all of these extents. That's why they > moved to chattr +C to make them nocow. An idea I had on systemd list > was to automatically make the journal directory a Btrfs subvolume, > similar to how systemd already creates a /var/lib/machines subvolume > for nspawn containers. This prevents the journals from being caught up > in a snapshot of the parent subvolume that typically contains the > journals (root fs). There's no practical use I can think of for > snapshotting logs. You'd really want the logs to always be linear, > contiguous, and never get rolled back. Even if something in the system > does get rolled back, you'd want the logs to show that and continue > on, rather than being rolled back themselves. > > So the super simple option would be continue with +C on journals, and > then a separate subvolume to prevent COW from ever happening > inadvertently. > > The same behavior happens with NTFS in qcow2 files. They quickly end > up with 100,000+ extents unless set nocow. It's like the worst case > scenario. > -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html