Yup, the tricky part would come on crash before close, interrupt, etc,
because I assume that that partially compressed file would be irrecoverable
(haven't verified this). Ideally, we'd be able to close it properly, but if
not, the log could, on startup, be recovered and compressed from the
parallel uncompressed log that was simultaneously being written by
another/the same appender.

That would incur start up time to recover, which may be more acceptable in
the rare case of a crash. Else, if there's another compression technique
that leaves behind readable files even if not closed properly, that'd
eliminate the need for recovery.

I'll open a jira ticket. Thanks for letting me share my thoughts on this.

- David


On Wed, May 28, 2014 at 9:39 AM, Matt Sicker <[email protected]> wrote:

> We can use GZIPOutputStream, DeflaterOutputStream, and ZipOutputStream all
> out of the box.
>
> What happens if you interrupt a stream in progress? No idea! But Gzip at
> least has CRC32 checksums on hand, so it can be detected if it's corrupted.
> We'll have to experiment a bit to see what really happens. I couldn't find
> anything in zlib.net's FAQ.
>
>
> On 28 May 2014 08:56, Ralph Goers <[email protected]> wrote:
>
>> What would happen to the file if the system crashed before the file is
>> closed? Would the file be able to be decompressed or would it be corrupted?
>>
>> Sent from my iPad
>>
>> On May 28, 2014, at 6:35 AM, Remko Popma <[email protected]> wrote:
>>
>> David, thank you for the clarification. I understand better what you are
>> trying to achieve now.
>>
>> Interesting idea to have an appender that writes to a GZipOutputStream.
>> Would you mind raising a Jira
>> <https://issues.apache.org/jira/browse/LOG4J2>ticket for that feature
>> request?
>>
>> I would certainly be interested in learning about efficient techniques
>> for compressing very large files. Not sure if or how the dd/direct I/O
>> mentioned in the blog you linked to could be leveraged from java. If you
>> find a way that works well for log file rollover, and you're interested in
>> sharing it, please let us know.
>>
>>
>>
>> On Wed, May 28, 2014 at 3:42 PM, David Hoa <[email protected]> wrote:
>>
>>> Hi Remko,
>>>
>>> My point about gzip, which we've experienced, is that compressing very
>>> large files (multi-GB) does have considerable impact on the system. The
>>> dd/direct I/O workaround avoid putting that much log data into your
>>> filesystem cache. For that problem, after I sent the email, I did look at
>>> the log4j2 implementation, and saw that in
>>> DefaultRolloverStrategy::rollover() it calls GZCompressionAction, so I see
>>> how I can write my own strategy and Action to customize how gzip is called.
>>>
>>> My second question was not about adding to existing gzip files; from
>>> what I know that's not possible. But if the GZipOutputStream is kept open
>>> and written to until closed by a rollover event, then the cost of gzipping
>>> is amortized over time rather than incurred when the rollover event gets
>>> triggered. The benefit is amortization of gzip so there's no resource usage
>>> spike; downside would be writing both compressed and uncompressed log files
>>> and maintaining rollover strategies for both of them. So a built in
>>> appender that wrote directly to gz files would be useful for this.
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> On Tue, May 27, 2014 at 4:52 PM, Remko Popma <[email protected]>wrote:
>>>
>>>> Hi David,
>>>>
>>>> I read the blog post you linked to. It seems that the author was very,
>>>> very upset that a utility called cp only uses a 512 byte buffer. He then
>>>> goes on to praise gzip for having a 32KB buffer.
>>>> So just based on your link, gzip is actually pretty good.
>>>>
>>>> That said, there are plans to improve the file rollover mechanism.
>>>> These plans are currently spread out over a number of Jira tickets. One
>>>> existing request is to delete archived log files that are older than some
>>>> number of days. (https://issues.apache.org/jira/browse/LOG4J2-656,
>>>> https://issues.apache.org/jira/browse/LOG4J2-524 )
>>>> This could be extended to cover your request to keep M compressed
>>>> files.
>>>>
>>>> I'm not sure about appending to existing gzip files. Why is this
>>>> desirable/What are you trying to accomplish with that?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On 2014/05/28, at 3:22, David Hoa <[email protected]> wrote:
>>>>
>>>> hi Log4j Dev,
>>>>
>>>> I am interested in the log rollover and compression feature in log4j2.
>>>> I read the documentation online, and still have some questions.
>>>>
>>>> - gzipping large files has performance impact on latencies/cpu/file
>>>> cache, and there's a workaround for that using dd and direct i/o. Is it
>>>> possible to customize how log4j2 gzips files (or does log4j2 already do
>>>> this)? See this link for a description of the common problem.
>>>>
>>>> http://kevinclosson.wordpress.com/2007/02/23/standard-file-utilities-with-direct-io/
>>>>
>>>> - is it possible to use the existing appenders to output directly to
>>>> their final gzipped files, maintain M of those gzipped files, and
>>>> rollover/maintain N of the uncompressed logs?  I suspect that the
>>>> complicated part would be in JVM crash recovery/ application restart. Any
>>>> suggestions on how best to add/extend/customize support for this?
>>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Matt Sicker <[email protected]>
>

Reply via email to