I tested your branch in the same configuration. First I had to change "logging-config: <root>=DEBUG" again. I thought that if you did "juju bootstrap --debug" that it always left DEBUG level on.
Anyway, something doesn't quite seem right, as I saw: -rw------- 1 syslog adm 500M Sep 16 07:25 all-machines.log ... -rw------- 1 syslog syslog 300M Sep 16 07:21 machine-0-2014-09-16T07-21-02.193.log -rw------- 1 syslog syslog 300M Sep 16 07:25 machine-0-2014-09-16T07-25-14.840.log -rw------- 1 syslog syslog 31M Sep 16 07:25 machine-0.log Which means that machine-0.log has gotten more than 600MB of data for sure (I saw it rotate at least once above this). But there is only a single all-machines.log of only 500MB. As I sat there watching, I did manage to see: -rw------- 1 syslog adm 7.0M Sep 16 07:26 all-machines.log -rw------- 1 syslog adm 513M Sep 16 07:26 all-machines.log.1 ... Which looks correct. But it would seem that something (load?) was causing it to not get all of the machine-0 messages. Maybe it is getting rate limited? I can watch the machine-0.log file fill at about 1MB per second. And then after about 5s I see all-machines.log only go up by a couple of MB. (machine-0 log seemed to go up by about 12MB, and then all-machines.log only went up by 3MB.) Now, I'm also seeing a rediculous amount of logging, and lots of lines like: http://paste.ubuntu.com/8356019/ Which seems to indicate that every 3 seconds the Rsyslog Watcher is dying and causing everything to restart and reconnect and ask for the new credentials. I'm also a bit surprised that we're telling them to connect to a DNS name, and not an IP address, but the resetting is the important bit. I filed https://bugs.launchpad.net/juju-core/+bug/1369900 to track that bug. Anyone care to try to figure out why I'm seeing 3-4x the amount of data in machine-0.log than seems to be ending up in all-machines.log ? John =:-> On Mon, Sep 15, 2014 at 7:13 PM, Wayne Witzel <[email protected]> wrote: > Here is a branch that addresses the concerns and bug. I've tested it > locally and with digitalocean. I would love for it to be tested under the > scaling scenario where you first encountered the issues. > > https://github.com/wwitzel3/juju/tree/ww3-rsyslogd-logrotate > > I've changed the size in logrotate to an arbitrarily small size, since the > rotation is driven by rsyslog. I've also updated the "style" of the > rotation to create a new log file and use postrotate to HUP rsyslog. > > Also I removed compression. I don't think we need it given that we are > only keeping one rotation around and it is size restricted. This will just > make it easier to get at information across both of the log files. > > Wayne > > On Mon, Sep 15, 2014 at 10:30 AM, Nate Finch <[email protected]> > wrote: > >> tl;dr: rsyslog sees 512,000,000 bytes and tells logrotate "time to >> rotate!" and logrotate sees less than 512MB and say "nah, not big enough" >> and rsyslog never writes the logs anymore because the file is too big. >> >> On Mon, Sep 15, 2014 at 10:07 AM, Wayne Witzel < >> [email protected]> wrote: >> >>> When this was being implemented and during the review process the actual >>> size of the rotated files got adjusted. I tested very thoroughly the actual >>> rotation script and also tested the triggering of the rotation script. >>> Enough that I was happy landing the changes. >>> >>> When the sizes were adjusted the same amount of diligence was not >>> applied in testing. This is what introduced the bug. >>> >>> The size mismatch is problem. rsyslogd will stop logging to the file and >>> attempted to execute the script in the outchannel every time it gets a new >>> log message, which runs logrotate, which does nothing because the file >>> isn't big enough. >>> >>> Here is the rsyslog docs for implementation of sized based rotation. >>> http://www.rsyslog.com/doc/log_rotation_fix_size.html >>> >>> I first attempted use the mv command, but with our rsyslogd >>> configuration, when I performed a mv, rsyslogd would stop logging to the >>> file until I actually did a reload on the process. Also this didn't easily >>> support things like compression or adding more archives if we decided to >>> keep more than one rotation around. So the choice to use logrotate is was >>> made. >>> >>> As for the actual act of reloading rsyslog. You can use copytruncate to >>> avoid it all together or you can use postrotate to reload rsyslogd. In both >>> cases there is a small window of possible data loss. Though with >>> copyandtruncate it is more likley to happen when the system is under load. >>> So changing to a postrotate that reloads rsyslogd is probably a good idea. >>> >>> The rotate setting is just the number of files to keep around. I asked >>> some people after the size update if 1 rotation, a total of 1GB was enough >>> to keep around. Most people thought it was fine, since the original purpose >>> of the sized based long rotation was to address a ticket that was >>> complaining about the juju log folder taking up 3GB+ of space. >>> >>> I will get a ticket created to address the size mismatch issue and >>> switch the copyandtruncate approach to a postrotate of rsyslogd. >>> >>> Wayne >>> >>> On Mon, Sep 15, 2014 at 5:30 AM, John Meinel <[email protected]> >>> wrote: >>> >>>> So we are using rsyslog.conf to have it figure out when rotation needs >>>> to be done with: >>>> # Maximum size for the log on this outchannel is 512MB >>>> # The command to execute when an outchannel as reached its size limit >>>> cannot accept any arguments >>>> # that is why we have created the helper script for executing logrotate. >>>> $outchannel >>>> logRotation,{{logDir}}/all-machines.log,512000000,{{logrotateHelperPath}} >>>> >>>> I would think that would not require SIGHUP along with having it run >>>> the script at our request. >>>> >>>> John >>>> =:-> >>>> >>>> On Mon, Sep 15, 2014 at 12:13 PM, Stuart Bishop < >>>> [email protected]> wrote: >>>> >>>>> On 15 September 2014 12:38, John Meinel <[email protected]> >>>>> wrote: >>>>> >>>>> > 7) "copytruncate" seems the wrong setting for interactive with >>>>> rsyslog. I >>>>> > believe rsyslog is already aware that the file needs to be rotated, >>>>> and thus >>>>> >>>>> It is only aware if you sent it a HUP signal. >>>>> >>>>> > it shouldn't be trying to write to the same file handle (and thus we >>>>> don't >>>>> > need to truncate in place). I'm not 100% sure on the interactions >>>>> here, but >>>>> > "copytruncate" seems to have an inherent likelyhood of dropping data >>>>> (while >>>>> > you are copying, if any data gets written then you'll miss those >>>>> last few >>>>> > bytes when you go to truncate, right?) >>>>> >>>>> -- >>>>> Stuart Bishop <[email protected]> >>>>> >>>> >>>> >>>> -- >>>> Juju-dev mailing list >>>> [email protected] >>>> Modify settings or unsubscribe at: >>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev >>>> >>>> >>> >>> >>> -- >>> Wayne Witzel III >>> [email protected] >>> >>> -- >>> Juju-dev mailing list >>> [email protected] >>> Modify settings or unsubscribe at: >>> https://lists.ubuntu.com/mailman/listinfo/juju-dev >>> >>> >> > > > -- > Wayne Witzel III > [email protected] >
-- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
