Redis is a great tool for the buffering scheme I have mentioned. It is rock
solid, battle tested, scalable, and reliable. Though it is yet another
component you need to manage if it is not already in your company's or
cloud provider's SaaS offerings. At bol.com, Redis is an outlier. It is
supported by the company platform, yet not as good as, say, PostgreSQL.
Google Cloud has a SaaS offering for Redis: Memorystore – though it is
falling short on fundamental authentication and authorization features,
hence not an option for us. You can opt for deploying it in Kubernetes, but
again, maintenance is an issue.

In conclusion, I strongly advise shielding Elasticsearch from your logging
load with an external buffer. Depending on the commodity buffering solution
in your context (Pub/Sub, Redis, Kafka, etc.), how you realize this buffer
varies.

On Wed, Aug 4, 2021 at 7:00 PM Matt Sicker <[email protected]> wrote:

> My current setup is essentially a small modification to the EcsLayout
> in JsonTemplateLayout (to ignore certain MDC values) to a rolling file
> appender with a sidecar container for forwarding log files to Splunk.
> The event log I'm considering here does not directly involve finances
> (you could stretch things by saying that using compute and storage
> resources is similar to spending money, but that's not what we're
> tracking). Another potential idea I've considered is dumping the JSON
> files to S3 or similar and then using standard data lake stuff for
> querying, though that just feels like Splunk with extra steps.
>
> How effective have you found using Redis or other non-durable stores
> in between the source and sink of logs?
>
> On Wed, Aug 4, 2021 at 11:34 AM Dominik Psenner <[email protected]>
> wrote:
> >
> > Hi
> >
> > I like Json as the transport format over mqtt. A few hundred lines of
> code
> > persist events to a postgresql database by calling a stored procedure
> that
> > handles the json directly. The stored procedure allows for plugging in
> some
> > fancy aggregation logic or distribution across tables or even retention
> > during persistency and if applied gradually with very little impact. This
> > has the benefit of no hours of downtime to delete millions of records
> that
> > would require a rebuild of the table otherwise. The stored procedure can
> > even be modified at runtime. And when the requirements become too tough
> for
> > the database to handle, the mqtt pub/sub system allows plugging in
> > additional processors like aggregators, alerters, ...
> >
> > Warm regards,
> > Dominik
> > --
> > Sent from my phone. Typos are a kind gift to anyone who happens to find
> > them.
> >
> > On Wed, Aug 4, 2021, 10:04 Volkan Yazıcı <[email protected]>
> wrote:
> >
> > > We have our Redis-shielded ELK stack – Redis acts as a buffer for
> > > fluctuating Elasticsearch ingestion performance and downtime. Apps use
> > > log4j2-redis-appender <https://github.com/vy/log4j2-redis-appender>,
> not
> > > the official Log4j one. This said, I am lobbying for replacing Redis
> with
> > > Google Cloud Pub/Sub – one less managed component to worry about.
> (Yes, my
> > > Google Cloud Pub/Sub appender PR is on the horizon!)
> > >
> > > I have heavily used Elasticsearch for both "search" (as in "Google
> search")
> > > and log sink purposes, professionally, for 5+ years. IMHO, in both use
> > > cases, it is the best tool in the F/OSS market that delivers. I did not
> > > understand your remark that Elasticsearch is geared toward "relatively
> > > short time period" retention. I have seen deployments spanning
> thousands of
> > > nodes with a couple of years of retention. It just works.
> > >
> > > If you are in the cloud, there are pretty good log sink solutions too,
> > > e.g., Google Cloud Logging. All your worries about retention and
> > > maintenance will be perfectly addressed there, granted you are willing
> to
> > > pay for that.
> > >
> > > If you need logging for auditing purposes, e.g., "mark this money
> transfer
> > > as completed", you are doing it wrong, I think. Most of the time, the
> > > shebang that happens after your log() statement is executed
> asynchronously
> > > at many layers, hence, failures don't propagate back. For one, all
> Log4j
> > > threads are daemon threads and will be killed upon a JVM exit without
> > > flushing their buffers. Indeed this is a controversial subject and one
> can
> > > possibly engineer a reliable logging infra, yet, again, I think this
> is the
> > > wrong tool for the job.
> > >
> > > Regarding DBMS log sinks, e.g., PostgreSQL, MySQL, MongoDB, Cassandra,
> they
> > > are good for persistence, scrolling through records, etc., but not for
> > > aggregation queries. I see two main issues that they fall short of
> > > addressing in my experience: 1) Many users reach out to queries
> combined
> > > with aggregations (e.g., show me a histogram of mdc.httpStatusCode in
> the
> > > last month for this long query of mine) and RDBMSes are tremendously
> slow
> > > compared to Elasticsearch/Lucene for such queries. One can argue that
> this
> > > is abusing logging for metrics. Yet, there it is. 2) Certain RDBMSes
> are
> > > darn difficult to (horizontally) scale, unless it is provided
> > > out-of-the-box.
> > >
> > > On Tue, Aug 3, 2021 at 6:50 PM Matt Sicker <[email protected]> wrote:
> > >
> > > > Hey all, I have a somewhat practical question related to logging
> here.
> > > > For those of you maintaining a structured event log or audit log of
> > > > some sort, what types of event log stores are you using to append
> them
> > > > to? I feel like solutions like Splunk, ELK, etc., are geared toward
> > > > diagnostic logs which don't necessarily need retention beyond a
> > > > relatively short time period. On the other hand, one of the more
> > > > natural append-only storage solutions I can think of is Kafka, though
> > > > that, too, isn't really geared toward long term storage (even if I
> can
> > > > theoretically fit the entire audit log on one machine). I've been
> > > > considering potentially using Cassandra here for durability and
> append
> > > > speed, but even that seems overkill since I don't want or need to be
> > > > able to ever update a log event after it's been stored. I've also
> > > > considered having Kafka as a layer in between, but that just feels
> > > > like overengineering as I don't expect event logs to populate nearly
> > > > as fast as, say, wind turbine sensor data where I last used that
> > > > architectural pattern.
> > > >
> > > > I'm curious if anyone has experience with building their own event
> log
> > > > storage service or using an existing one along with any advice.
> > > >
> > >
>

Reply via email to