We have our Redis-shielded ELK stack – Redis acts as a buffer for
fluctuating Elasticsearch ingestion performance and downtime. Apps use
log4j2-redis-appender <https://github.com/vy/log4j2-redis-appender>, not
the official Log4j one. This said, I am lobbying for replacing Redis with
Google Cloud Pub/Sub – one less managed component to worry about. (Yes, my
Google Cloud Pub/Sub appender PR is on the horizon!)

I have heavily used Elasticsearch for both "search" (as in "Google search")
and log sink purposes, professionally, for 5+ years. IMHO, in both use
cases, it is the best tool in the F/OSS market that delivers. I did not
understand your remark that Elasticsearch is geared toward "relatively
short time period" retention. I have seen deployments spanning thousands of
nodes with a couple of years of retention. It just works.

If you are in the cloud, there are pretty good log sink solutions too,
e.g., Google Cloud Logging. All your worries about retention and
maintenance will be perfectly addressed there, granted you are willing to
pay for that.

If you need logging for auditing purposes, e.g., "mark this money transfer
as completed", you are doing it wrong, I think. Most of the time, the
shebang that happens after your log() statement is executed asynchronously
at many layers, hence, failures don't propagate back. For one, all Log4j
threads are daemon threads and will be killed upon a JVM exit without
flushing their buffers. Indeed this is a controversial subject and one can
possibly engineer a reliable logging infra, yet, again, I think this is the
wrong tool for the job.

Regarding DBMS log sinks, e.g., PostgreSQL, MySQL, MongoDB, Cassandra, they
are good for persistence, scrolling through records, etc., but not for
aggregation queries. I see two main issues that they fall short of
addressing in my experience: 1) Many users reach out to queries combined
with aggregations (e.g., show me a histogram of mdc.httpStatusCode in the
last month for this long query of mine) and RDBMSes are tremendously slow
compared to Elasticsearch/Lucene for such queries. One can argue that this
is abusing logging for metrics. Yet, there it is. 2) Certain RDBMSes are
darn difficult to (horizontally) scale, unless it is provided
out-of-the-box.

On Tue, Aug 3, 2021 at 6:50 PM Matt Sicker <[email protected]> wrote:

> Hey all, I have a somewhat practical question related to logging here.
> For those of you maintaining a structured event log or audit log of
> some sort, what types of event log stores are you using to append them
> to? I feel like solutions like Splunk, ELK, etc., are geared toward
> diagnostic logs which don't necessarily need retention beyond a
> relatively short time period. On the other hand, one of the more
> natural append-only storage solutions I can think of is Kafka, though
> that, too, isn't really geared toward long term storage (even if I can
> theoretically fit the entire audit log on one machine). I've been
> considering potentially using Cassandra here for durability and append
> speed, but even that seems overkill since I don't want or need to be
> able to ever update a log event after it's been stored. I've also
> considered having Kafka as a layer in between, but that just feels
> like overengineering as I don't expect event logs to populate nearly
> as fast as, say, wind turbine sensor data where I last used that
> architectural pattern.
>
> I'm curious if anyone has experience with building their own event log
> storage service or using an existing one along with any advice.
>

Reply via email to