Re: [general] Recommended event log storage?

Dominik Psenner Wed, 04 Aug 2021 09:34:17 -0700

Hi

I like Json as the transport format over mqtt. A few hundred lines of code
persist events to a postgresql database by calling a stored procedure that
handles the json directly. The stored procedure allows for plugging in some
fancy aggregation logic or distribution across tables or even retention
during persistency and if applied gradually with very little impact. This
has the benefit of no hours of downtime to delete millions of records that
would require a rebuild of the table otherwise. The stored procedure can
even be modified at runtime. And when the requirements become too tough for
the database to handle, the mqtt pub/sub system allows plugging in
additional processors like aggregators, alerters, ...


Warm regards,
Dominik
--
Sent from my phone. Typos are a kind gift to anyone who happens to find
them.

On Wed, Aug 4, 2021, 10:04 Volkan Yazıcı <[email protected]> wrote:

> We have our Redis-shielded ELK stack – Redis acts as a buffer for
> fluctuating Elasticsearch ingestion performance and downtime. Apps use
> log4j2-redis-appender <https://github.com/vy/log4j2-redis-appender>, not
> the official Log4j one. This said, I am lobbying for replacing Redis with
> Google Cloud Pub/Sub – one less managed component to worry about. (Yes, my
> Google Cloud Pub/Sub appender PR is on the horizon!)
>
> I have heavily used Elasticsearch for both "search" (as in "Google search")
> and log sink purposes, professionally, for 5+ years. IMHO, in both use
> cases, it is the best tool in the F/OSS market that delivers. I did not
> understand your remark that Elasticsearch is geared toward "relatively
> short time period" retention. I have seen deployments spanning thousands of
> nodes with a couple of years of retention. It just works.
>
> If you are in the cloud, there are pretty good log sink solutions too,
> e.g., Google Cloud Logging. All your worries about retention and
> maintenance will be perfectly addressed there, granted you are willing to
> pay for that.
>
> If you need logging for auditing purposes, e.g., "mark this money transfer
> as completed", you are doing it wrong, I think. Most of the time, the
> shebang that happens after your log() statement is executed asynchronously
> at many layers, hence, failures don't propagate back. For one, all Log4j
> threads are daemon threads and will be killed upon a JVM exit without
> flushing their buffers. Indeed this is a controversial subject and one can
> possibly engineer a reliable logging infra, yet, again, I think this is the
> wrong tool for the job.
>
> Regarding DBMS log sinks, e.g., PostgreSQL, MySQL, MongoDB, Cassandra, they
> are good for persistence, scrolling through records, etc., but not for
> aggregation queries. I see two main issues that they fall short of
> addressing in my experience: 1) Many users reach out to queries combined
> with aggregations (e.g., show me a histogram of mdc.httpStatusCode in the
> last month for this long query of mine) and RDBMSes are tremendously slow
> compared to Elasticsearch/Lucene for such queries. One can argue that this
> is abusing logging for metrics. Yet, there it is. 2) Certain RDBMSes are
> darn difficult to (horizontally) scale, unless it is provided
> out-of-the-box.
>
> On Tue, Aug 3, 2021 at 6:50 PM Matt Sicker <[email protected]> wrote:
>
> > Hey all, I have a somewhat practical question related to logging here.
> > For those of you maintaining a structured event log or audit log of
> > some sort, what types of event log stores are you using to append them
> > to? I feel like solutions like Splunk, ELK, etc., are geared toward
> > diagnostic logs which don't necessarily need retention beyond a
> > relatively short time period. On the other hand, one of the more
> > natural append-only storage solutions I can think of is Kafka, though
> > that, too, isn't really geared toward long term storage (even if I can
> > theoretically fit the entire audit log on one machine). I've been
> > considering potentially using Cassandra here for durability and append
> > speed, but even that seems overkill since I don't want or need to be
> > able to ever update a log event after it's been stored. I've also
> > considered having Kafka as a layer in between, but that just feels
> > like overengineering as I don't expect event logs to populate nearly
> > as fast as, say, wind turbine sensor data where I last used that
> > architectural pattern.
> >
> > I'm curious if anyone has experience with building their own event log
> > storage service or using an existing one along with any advice.
> >
>

Re: [general] Recommended event log storage?

Reply via email to