Re: Audit logging to tables.

Sagar Fri, 01 Mar 2019 17:26:20 -0800

Thanks all for the pointers. Really insightful.

Subroto I think that’s part of the enterprise version but yeah even I have
seen it. Again not sure of the performance implications.


Sagar.

On Sat, 2 Mar 2019 at 5:15 AM, Subroto Barua <sbarua...@yahoo.com.invalid>
wrote:

> Datastax version has an option to store audit info to dse_audit.audit_log
> table; I do not know the performance impact since I use the file option
>
> Subroto
>
> > On Mar 1, 2019, at 9:40 AM, Jeremiah D Jordan <jeremiah.jor...@gmail.com>
> wrote:
> >
> > AFAIK the Full Query Logging binary format was already made more general
> in order to support using that format for the audit logging.
> >
> > -Jeremiah
> >
> >> On Mar 1, 2019, at 11:38 AM, Joshua McKenzie <jmcken...@apache.org>
> wrote:
> >>
> >> Is there a world in which a general purpose, side-channel file storage
> >> format for transient things like this (hints, batches, audit logs, etc)
> >> could be useful as a first class citizen in the codebase? i.e. a world
> in
> >> which we refactored some of the hints-specific reader/writer code to be
> >> used for things like this if/when they come up?
> >>
> >>> On Thu, Feb 28, 2019 at 12:04 PM Jonathan Haddad <j...@jonhaddad.com
> <mailto:j...@jonhaddad.com>> wrote:
> >>>
> >>> Agreed with Dinesh and Josh.  I would *never* put the audit log back in
> >>> Cassandra.
> >>>
> >>> This is extendable, Sagar, so you're free to do as you want, but I'm
> very
> >>> opposed to putting a ticking time bomb in Cassandra proper.
> >>>
> >>> Jon
> >>>
> >>>
> >>> On Thu, Feb 28, 2019 at 8:38 AM Dinesh Joshi
> <djos...@icloud.com.invalid>
> >>> wrote:
> >>>
> >>>> I strongly echo Josh’s sentiment. Imagine losing audit entries
> because C*
> >>>> is overloaded? It’s fine if you don’t care about losing audit entries.
> >>>>
> >>>> Dinesh
> >>>>
> >>>>> On Feb 28, 2019, at 6:41 AM, Joshua McKenzie <jmcken...@apache.org>
> >>>> wrote:
> >>>>>
> >>>>> One of the things we've run into historically, on a *lot* of axes, is
> >>>> that
> >>>>> "just put it in C*" for various functionality looks great from a user
> >>> and
> >>>>> usability perspective, and proves to be something of a nightmare from
> >>> an
> >>>>> admin / cluster behavior perspective.
> >>>>>
> >>>>> i.e. - cluster suffering so you're writing hints? Write them to C*
> >>> tables
> >>>>> and watch the cluster suffer more! :)
> >>>>> Same thing probably holds true for audit logging - at a time frame
> when
> >>>>> things are getting hairy w/a cluster, if you're writing that audit
> >>>> logging
> >>>>> into C* proper (and dealing with ser/deser, compaction pressure,
> >>> flushing
> >>>>> pressure, etc) from that, there's a compounding effect of pressure
> and
> >>>> pain
> >>>>> on the cluster.
> >>>>>
> >>>>> So the TL;DR we as a project kind of philosophically have been moving
> >>>>> towards (I think that's valid to say?) is: use C* for the things it's
> >>>>> absolutely great at, and try to side-channel other recovery
> operations
> >>> as
> >>>>> much as you can (see: file-based hints) to stay out of its way.
> >>>>>
> >>>>> Same thing held true w/design of CDC - I debated "materialize in
> memory
> >>>> for
> >>>>> consumer to take over socket", and "keep the data in another C*
> table",
> >>>> but
> >>>>> the ramifications to perf and core I/O operations in C* the moment
> >>> things
> >>>>> start to go badly were significant enough that the route we went was
> >>> "do
> >>>> no
> >>>>> harm". For better or for worse, as there's obvious tradeoffs there.
> >>>>>
> >>>>>> On Thu, Feb 28, 2019 at 7:46 AM Sagar <sagarmeansoc...@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>> Thanks all for the pointers.
> >>>>>>
> >>>>>> @Joseph,
> >>>>>>
> >>>>>> I have gone through the links shared by you. Also, I have been
> looking
> >>>> at
> >>>>>> the code base.
> >>>>>>
> >>>>>> I understand the fact that pushing the logs to ES or Solr is a lot
> >>>> easier
> >>>>>> to do. Having said that, the only reason I thought having something
> >>> like
> >>>>>> this might help is, if I don't want to add more pieces and still
> >>>> provide a
> >>>>>> central piece of audit logging within Cassandra itself and still be
> >>>>>> queryable.
> >>>>>>
> >>>>>> In terms of usages, one of them could definitely be CDC related use
> >>>> cases.
> >>>>>> With data being stored in tables and being queryable, it can become
> a
> >>>> lot
> >>>>>> more easier to expose this data to external systems like Kafka
> >>> Connect,
> >>>>>> Debezium which have the ability to push data to Kafka for example.
> >>> Note
> >>>>>> that pushing data to Kafka is just an example, but what I mean is,
> if
> >>> we
> >>>>>> can have data in tables, then instead of everyone writing custom
> >>> custom
> >>>>>> loggers, they can hook into this table info and take action.
> >>>>>>
> >>>>>> Regarding the infinite loop question, I have done some analysis, and
> >>> in
> >>>> my
> >>>>>> opinion, instead of tweaking the behaviour of Binlog and the way it
> >>>>>> functions currently, we can actually spin up another tailer thread
> to
> >>>> the
> >>>>>> same Chronicle Queue which can do the needful. This way the config
> >>>> options
> >>>>>> etc all remain the same(apart from the logger ofcourse).
> >>>>>>
> >>>>>> Let me know if any of it makes sense :D
> >>>>>>
> >>>>>> Thanks!
> >>>>>> Sagar.
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Feb 28, 2019 at 1:09 AM Dinesh Joshi
> >>> <djos...@icloud.com.invalid
> >>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Feb 27, 2019, at 10:41 AM, Joseph Lynch <joe.e.ly...@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Vinay can confirm, but as far as I am aware we have no current
> plans
> >>>> to
> >>>>>>>> implement audit logging to a table directly, but the
> implementation
> >>> is
> >>>>>>>> fully pluggable (like compaction, compression, etc ...). Check out
> >>> the
> >>>>>>> blog
> >>>>>>>> post [1] and documentation [2] Vinay wrote for more details, but
> the
> >>>>>>> short
> >>>>>>>
> >>>>>>> +1. I am still curious as to why you'd want to store audit log
> >>> entries
> >>>>>>> back in Cassandra? Depending on the scale it can generate a lot of
> >>> load
> >>>>>> and
> >>>>>>> I think you'd end up in an infinite loop because as you're
> inserting
> >>>> the
> >>>>>>> audit log entry you'll generate a new one and so on unless you
> black
> >>>> list
> >>>>>>> audits to that table / keyspace.
> >>>>>>>
> >>>>>>> Ideally you'd insert this data into ElasticSearch / Solr or some
> >>> other
> >>>>>>> place that can be then used for analytics or search.
> >>>>>>>
> >>>>>>> Dinesh
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>
> >>>>
> >>>
> >>> --
> >>> Jon Haddad
> >>>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=vyXA1unA3gpHGCpKOfRurmET3jOHaV2bjs1mHVVsb2U&s=EDg90XhABktX19m4FaDHKIjFaU2YAHbXjeEGk7Jx6dk&e=
> <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=vyXA1unA3gpHGCpKOfRurmET3jOHaV2bjs1mHVVsb2U&s=EDg90XhABktX19m4FaDHKIjFaU2YAHbXjeEGk7Jx6dk&e=
> >
> >>> twitter: rustyrazorblade
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: Audit logging to tables.

Reply via email to