On 10/11/2012 01:15 PM, Илья Шипицин wrote:
2012/10/11 Jiri B <ji...@devio.us>

On Thu, Oct 11, 2012 at 09:29:50PM +0600, Ð?лÑ?Ñ? ШипиÑ?ин
wrote:

there are http access logs for half an year.

this is a trivial case where using multiple file systems works wonderfully.

it's easier to rotate them on a single filesystem from many points of
view,

easier ONLY in the "didn't have to think about anything" sense. Not in the "I'll be ripping my hair out over and over again" sense. Doing it wrong is usually very easy...initially.

we also share it via samba (very tricky to share many chunks).

actually, no.

/log       shared here.  Only this is shared.
/log/a   (full, ro)
    /b   (full, ro)
    /c   (partly full, rw)
    /d   (empty, waiting to be used, rw)
    /curr -> sym link to the active chunk -- in this case, /log/c

/smb/[a..d] are individual file systems.


and it is bad idea to mount access logs R/O. difficult to rotate.

actually, your archival copies should be RO, if you are required to retain them for legal or security reasons. You don't want them changing...you probably want secure hashes made to prove they didn't change.

Bad design totally! I remember struggling with backup/restore times
to satisfy SLA with huge filesystems having many files... And those
were logs.

One of proposals we did was to split filesystem into smaller ones and
keep old logs on filesystems with read-only. Backup would be skipped,
and restore (in this it was TSM) would be much faster if image would
be used.

j.



they are not "old" logs.
generally, today's log is access.log, yesterday's log is "access.log.0" and
so on.
every rotate renames all the logs. older logs are removed.

too many tricks with r/o filesystems.

also, when dealing with rotating logs within single filesystem, it's cheap,
data is not moved.
and what if I want to move/rotate many-many-gigabytes logs in case of
"better design" when there're many chunks ?
I guess it is hard (and pretty useless) operation from filesystem point of
view.

incorrect.

ok, I can change configs of web-server to store logs in different location
every day. you call it "better design" ??


First solution that leaps to my mind: move your logging to syslog, and send the syslog output to another machine. Now, the availability of your logging system doesn't impact the availability of your webserver.

Set up your logging server to log to /log/curr. That's a symlink to a particular chunk of disk. At midnight, you have a little script run, it looks to see if you are within a couple days of being out of disk space on the current archive chunk, if so, you change the symlink (note files already open on the old one will stay open, be ready for that) to the next recording partition. (note: this symlink could also point to a directory within the partition). You can do this in a fixed rotation, I prefer to have a predefined list of "use this next", as I've had to off-line storage that I wasn't likely to need, but needed to retain.


Another solution: If you don't like remote syslogging (i.e., you absolutely have to retain every line of access, you can't tolerate losing log data when you reboot the log machine, and you don't want to use a buffering log agent app), you could simply scp off the old log files. Generate an sha256 hash for the file when it is rotated out, and when you see the hash, copy the file and its hash over to the log storage machine, verify the hash, and if it matches, delete it from the source machine. If it doesn't match, re-copy the file next time 'round.

Really, simple stuff. Much simpler than trying to manage data in one big chunk. What do you plan to do when 7TB isn't enough to retain your required six months of data? How do you back it all up? How do you restore it when the array barfs?

If you wish to upgrade your logging capability, build out a new logging system, point the systems at it, mothball the old system and when your retention period is over, wipe the old system (look ma! no copying terabytes of data!).

I know some people trying to manage many terabytes of fast-moving data in one chunk. They started with FreeBSD and ZFS, but had problems with it (and a definite Linux bias), so they jumped to Linux, but again are finding Big File Systems are difficult. Would be so much easier for so many reasons if they just "chunked" their data across multiple file systems... Ah well...

Nick.

Reply via email to