Re: Audit logging to tables.

Sagar Mon, 04 Mar 2019 03:29:56 -0800

Hi Jonathan,

I couldn't find much literature on Virtual tables apart from this ticket:


https://issues.apache.org/jira/browse/CASSANDRA-7622

Any insights would be helpful.

Thanks!
Sagar.

On Sat, Mar 2, 2019 at 7:23 AM Jonathan Haddad <[email protected]> wrote:

> Instead of logging to tables, putting a virtual table around the audit /
> query logs might be an option. Same with the commit log for cdc
>
> On Fri, Mar 1, 2019 at 5:25 PM Sagar <[email protected]> wrote:
>
> > Thanks all for the pointers. Really insightful.
> >
> > Subroto I think that’s part of the enterprise version but yeah even I
> have
> > seen it. Again not sure of the performance implications.
> >
> > Sagar.
> >
> > On Sat, 2 Mar 2019 at 5:15 AM, Subroto Barua <[email protected]
> >
> > wrote:
> >
> > > Datastax version has an option to store audit info to
> dse_audit.audit_log
> > > table; I do not know the performance impact since I use the file option
> > >
> > > Subroto
> > >
> > > > On Mar 1, 2019, at 9:40 AM, Jeremiah D Jordan <
> > [email protected]>
> > > wrote:
> > > >
> > > > AFAIK the Full Query Logging binary format was already made more
> > general
> > > in order to support using that format for the audit logging.
> > > >
> > > > -Jeremiah
> > > >
> > > >> On Mar 1, 2019, at 11:38 AM, Joshua McKenzie <[email protected]>
> > > wrote:
> > > >>
> > > >> Is there a world in which a general purpose, side-channel file
> storage
> > > >> format for transient things like this (hints, batches, audit logs,
> > etc)
> > > >> could be useful as a first class citizen in the codebase? i.e. a
> world
> > > in
> > > >> which we refactored some of the hints-specific reader/writer code to
> > be
> > > >> used for things like this if/when they come up?
> > > >>
> > > >>> On Thu, Feb 28, 2019 at 12:04 PM Jonathan Haddad <
> [email protected]
> > > <mailto:[email protected]>> wrote:
> > > >>>
> > > >>> Agreed with Dinesh and Josh.  I would *never* put the audit log
> back
> > in
> > > >>> Cassandra.
> > > >>>
> > > >>> This is extendable, Sagar, so you're free to do as you want, but
> I'm
> > > very
> > > >>> opposed to putting a ticking time bomb in Cassandra proper.
> > > >>>
> > > >>> Jon
> > > >>>
> > > >>>
> > > >>> On Thu, Feb 28, 2019 at 8:38 AM Dinesh Joshi
> > > <[email protected]>
> > > >>> wrote:
> > > >>>
> > > >>>> I strongly echo Josh’s sentiment. Imagine losing audit entries
> > > because C*
> > > >>>> is overloaded? It’s fine if you don’t care about losing audit
> > entries.
> > > >>>>
> > > >>>> Dinesh
> > > >>>>
> > > >>>>> On Feb 28, 2019, at 6:41 AM, Joshua McKenzie <
> [email protected]
> > >
> > > >>>> wrote:
> > > >>>>>
> > > >>>>> One of the things we've run into historically, on a *lot* of
> axes,
> > is
> > > >>>> that
> > > >>>>> "just put it in C*" for various functionality looks great from a
> > user
> > > >>> and
> > > >>>>> usability perspective, and proves to be something of a nightmare
> > from
> > > >>> an
> > > >>>>> admin / cluster behavior perspective.
> > > >>>>>
> > > >>>>> i.e. - cluster suffering so you're writing hints? Write them to
> C*
> > > >>> tables
> > > >>>>> and watch the cluster suffer more! :)
> > > >>>>> Same thing probably holds true for audit logging - at a time
> frame
> > > when
> > > >>>>> things are getting hairy w/a cluster, if you're writing that
> audit
> > > >>>> logging
> > > >>>>> into C* proper (and dealing with ser/deser, compaction pressure,
> > > >>> flushing
> > > >>>>> pressure, etc) from that, there's a compounding effect of
> pressure
> > > and
> > > >>>> pain
> > > >>>>> on the cluster.
> > > >>>>>
> > > >>>>> So the TL;DR we as a project kind of philosophically have been
> > moving
> > > >>>>> towards (I think that's valid to say?) is: use C* for the things
> > it's
> > > >>>>> absolutely great at, and try to side-channel other recovery
> > > operations
> > > >>> as
> > > >>>>> much as you can (see: file-based hints) to stay out of its way.
> > > >>>>>
> > > >>>>> Same thing held true w/design of CDC - I debated "materialize in
> > > memory
> > > >>>> for
> > > >>>>> consumer to take over socket", and "keep the data in another C*
> > > table",
> > > >>>> but
> > > >>>>> the ramifications to perf and core I/O operations in C* the
> moment
> > > >>> things
> > > >>>>> start to go badly were significant enough that the route we went
> > was
> > > >>> "do
> > > >>>> no
> > > >>>>> harm". For better or for worse, as there's obvious tradeoffs
> there.
> > > >>>>>
> > > >>>>>> On Thu, Feb 28, 2019 at 7:46 AM Sagar <
> [email protected]>
> > > >>>> wrote:
> > > >>>>>>
> > > >>>>>> Thanks all for the pointers.
> > > >>>>>>
> > > >>>>>> @Joseph,
> > > >>>>>>
> > > >>>>>> I have gone through the links shared by you. Also, I have been
> > > looking
> > > >>>> at
> > > >>>>>> the code base.
> > > >>>>>>
> > > >>>>>> I understand the fact that pushing the logs to ES or Solr is a
> lot
> > > >>>> easier
> > > >>>>>> to do. Having said that, the only reason I thought having
> > something
> > > >>> like
> > > >>>>>> this might help is, if I don't want to add more pieces and still
> > > >>>> provide a
> > > >>>>>> central piece of audit logging within Cassandra itself and still
> > be
> > > >>>>>> queryable.
> > > >>>>>>
> > > >>>>>> In terms of usages, one of them could definitely be CDC related
> > use
> > > >>>> cases.
> > > >>>>>> With data being stored in tables and being queryable, it can
> > become
> > > a
> > > >>>> lot
> > > >>>>>> more easier to expose this data to external systems like Kafka
> > > >>> Connect,
> > > >>>>>> Debezium which have the ability to push data to Kafka for
> example.
> > > >>> Note
> > > >>>>>> that pushing data to Kafka is just an example, but what I mean
> is,
> > > if
> > > >>> we
> > > >>>>>> can have data in tables, then instead of everyone writing custom
> > > >>> custom
> > > >>>>>> loggers, they can hook into this table info and take action.
> > > >>>>>>
> > > >>>>>> Regarding the infinite loop question, I have done some analysis,
> > and
> > > >>> in
> > > >>>> my
> > > >>>>>> opinion, instead of tweaking the behaviour of Binlog and the way
> > it
> > > >>>>>> functions currently, we can actually spin up another tailer
> thread
> > > to
> > > >>>> the
> > > >>>>>> same Chronicle Queue which can do the needful. This way the
> config
> > > >>>> options
> > > >>>>>> etc all remain the same(apart from the logger ofcourse).
> > > >>>>>>
> > > >>>>>> Let me know if any of it makes sense :D
> > > >>>>>>
> > > >>>>>> Thanks!
> > > >>>>>> Sagar.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Thu, Feb 28, 2019 at 1:09 AM Dinesh Joshi
> > > >>> <[email protected]
> > > >>>>>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> On Feb 27, 2019, at 10:41 AM, Joseph Lynch <
> > [email protected]
> > > >
> > > >>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>> Vinay can confirm, but as far as I am aware we have no current
> > > plans
> > > >>>> to
> > > >>>>>>>> implement audit logging to a table directly, but the
> > > implementation
> > > >>> is
> > > >>>>>>>> fully pluggable (like compaction, compression, etc ...). Check
> > out
> > > >>> the
> > > >>>>>>> blog
> > > >>>>>>>> post [1] and documentation [2] Vinay wrote for more details,
> but
> > > the
> > > >>>>>>> short
> > > >>>>>>>
> > > >>>>>>> +1. I am still curious as to why you'd want to store audit log
> > > >>> entries
> > > >>>>>>> back in Cassandra? Depending on the scale it can generate a lot
> > of
> > > >>> load
> > > >>>>>> and
> > > >>>>>>> I think you'd end up in an infinite loop because as you're
> > > inserting
> > > >>>> the
> > > >>>>>>> audit log entry you'll generate a new one and so on unless you
> > > black
> > > >>>> list
> > > >>>>>>> audits to that table / keyspace.
> > > >>>>>>>
> > > >>>>>>> Ideally you'd insert this data into ElasticSearch / Solr or
> some
> > > >>> other
> > > >>>>>>> place that can be then used for analytics or search.
> > > >>>>>>>
> > > >>>>>>> Dinesh
> > > >>>>>>>
> > > ---------------------------------------------------------------------
> > > >>>>>>> To unsubscribe, e-mail: [email protected]
> > > >>>>>>> For additional commands, e-mail: [email protected]
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > ---------------------------------------------------------------------
> > > >>>> To unsubscribe, e-mail: [email protected]
> > > >>>> For additional commands, e-mail: [email protected]
> > > >>>>
> > > >>>>
> > > >>>
> > > >>> --
> > > >>> Jon Haddad
> > > >>>
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=vyXA1unA3gpHGCpKOfRurmET3jOHaV2bjs1mHVVsb2U&s=EDg90XhABktX19m4FaDHKIjFaU2YAHbXjeEGk7Jx6dk&e=
> > > <
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=vyXA1unA3gpHGCpKOfRurmET3jOHaV2bjs1mHVVsb2U&s=EDg90XhABktX19m4FaDHKIjFaU2YAHbXjeEGk7Jx6dk&e=
> > > >
> > > >>> twitter: rustyrazorblade
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
> >
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>

Re: Audit logging to tables.

Reply via email to