Instead of logging to tables, putting a virtual table around the audit / query logs might be an option. Same with the commit log for cdc
On Fri, Mar 1, 2019 at 5:25 PM Sagar <sagarmeansoc...@gmail.com> wrote: > Thanks all for the pointers. Really insightful. > > Subroto I think that’s part of the enterprise version but yeah even I have > seen it. Again not sure of the performance implications. > > Sagar. > > On Sat, 2 Mar 2019 at 5:15 AM, Subroto Barua <sbarua...@yahoo.com.invalid> > wrote: > > > Datastax version has an option to store audit info to dse_audit.audit_log > > table; I do not know the performance impact since I use the file option > > > > Subroto > > > > > On Mar 1, 2019, at 9:40 AM, Jeremiah D Jordan < > jeremiah.jor...@gmail.com> > > wrote: > > > > > > AFAIK the Full Query Logging binary format was already made more > general > > in order to support using that format for the audit logging. > > > > > > -Jeremiah > > > > > >> On Mar 1, 2019, at 11:38 AM, Joshua McKenzie <jmcken...@apache.org> > > wrote: > > >> > > >> Is there a world in which a general purpose, side-channel file storage > > >> format for transient things like this (hints, batches, audit logs, > etc) > > >> could be useful as a first class citizen in the codebase? i.e. a world > > in > > >> which we refactored some of the hints-specific reader/writer code to > be > > >> used for things like this if/when they come up? > > >> > > >>> On Thu, Feb 28, 2019 at 12:04 PM Jonathan Haddad <j...@jonhaddad.com > > <mailto:j...@jonhaddad.com>> wrote: > > >>> > > >>> Agreed with Dinesh and Josh. I would *never* put the audit log back > in > > >>> Cassandra. > > >>> > > >>> This is extendable, Sagar, so you're free to do as you want, but I'm > > very > > >>> opposed to putting a ticking time bomb in Cassandra proper. > > >>> > > >>> Jon > > >>> > > >>> > > >>> On Thu, Feb 28, 2019 at 8:38 AM Dinesh Joshi > > <djos...@icloud.com.invalid> > > >>> wrote: > > >>> > > >>>> I strongly echo Josh’s sentiment. Imagine losing audit entries > > because C* > > >>>> is overloaded? It’s fine if you don’t care about losing audit > entries. > > >>>> > > >>>> Dinesh > > >>>> > > >>>>> On Feb 28, 2019, at 6:41 AM, Joshua McKenzie <jmcken...@apache.org > > > > >>>> wrote: > > >>>>> > > >>>>> One of the things we've run into historically, on a *lot* of axes, > is > > >>>> that > > >>>>> "just put it in C*" for various functionality looks great from a > user > > >>> and > > >>>>> usability perspective, and proves to be something of a nightmare > from > > >>> an > > >>>>> admin / cluster behavior perspective. > > >>>>> > > >>>>> i.e. - cluster suffering so you're writing hints? Write them to C* > > >>> tables > > >>>>> and watch the cluster suffer more! :) > > >>>>> Same thing probably holds true for audit logging - at a time frame > > when > > >>>>> things are getting hairy w/a cluster, if you're writing that audit > > >>>> logging > > >>>>> into C* proper (and dealing with ser/deser, compaction pressure, > > >>> flushing > > >>>>> pressure, etc) from that, there's a compounding effect of pressure > > and > > >>>> pain > > >>>>> on the cluster. > > >>>>> > > >>>>> So the TL;DR we as a project kind of philosophically have been > moving > > >>>>> towards (I think that's valid to say?) is: use C* for the things > it's > > >>>>> absolutely great at, and try to side-channel other recovery > > operations > > >>> as > > >>>>> much as you can (see: file-based hints) to stay out of its way. > > >>>>> > > >>>>> Same thing held true w/design of CDC - I debated "materialize in > > memory > > >>>> for > > >>>>> consumer to take over socket", and "keep the data in another C* > > table", > > >>>> but > > >>>>> the ramifications to perf and core I/O operations in C* the moment > > >>> things > > >>>>> start to go badly were significant enough that the route we went > was > > >>> "do > > >>>> no > > >>>>> harm". For better or for worse, as there's obvious tradeoffs there. > > >>>>> > > >>>>>> On Thu, Feb 28, 2019 at 7:46 AM Sagar <sagarmeansoc...@gmail.com> > > >>>> wrote: > > >>>>>> > > >>>>>> Thanks all for the pointers. > > >>>>>> > > >>>>>> @Joseph, > > >>>>>> > > >>>>>> I have gone through the links shared by you. Also, I have been > > looking > > >>>> at > > >>>>>> the code base. > > >>>>>> > > >>>>>> I understand the fact that pushing the logs to ES or Solr is a lot > > >>>> easier > > >>>>>> to do. Having said that, the only reason I thought having > something > > >>> like > > >>>>>> this might help is, if I don't want to add more pieces and still > > >>>> provide a > > >>>>>> central piece of audit logging within Cassandra itself and still > be > > >>>>>> queryable. > > >>>>>> > > >>>>>> In terms of usages, one of them could definitely be CDC related > use > > >>>> cases. > > >>>>>> With data being stored in tables and being queryable, it can > become > > a > > >>>> lot > > >>>>>> more easier to expose this data to external systems like Kafka > > >>> Connect, > > >>>>>> Debezium which have the ability to push data to Kafka for example. > > >>> Note > > >>>>>> that pushing data to Kafka is just an example, but what I mean is, > > if > > >>> we > > >>>>>> can have data in tables, then instead of everyone writing custom > > >>> custom > > >>>>>> loggers, they can hook into this table info and take action. > > >>>>>> > > >>>>>> Regarding the infinite loop question, I have done some analysis, > and > > >>> in > > >>>> my > > >>>>>> opinion, instead of tweaking the behaviour of Binlog and the way > it > > >>>>>> functions currently, we can actually spin up another tailer thread > > to > > >>>> the > > >>>>>> same Chronicle Queue which can do the needful. This way the config > > >>>> options > > >>>>>> etc all remain the same(apart from the logger ofcourse). > > >>>>>> > > >>>>>> Let me know if any of it makes sense :D > > >>>>>> > > >>>>>> Thanks! > > >>>>>> Sagar. > > >>>>>> > > >>>>>> > > >>>>>> On Thu, Feb 28, 2019 at 1:09 AM Dinesh Joshi > > >>> <djos...@icloud.com.invalid > > >>>>> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>>> On Feb 27, 2019, at 10:41 AM, Joseph Lynch < > joe.e.ly...@gmail.com > > > > > >>>>>>> wrote: > > >>>>>>>> > > >>>>>>>> Vinay can confirm, but as far as I am aware we have no current > > plans > > >>>> to > > >>>>>>>> implement audit logging to a table directly, but the > > implementation > > >>> is > > >>>>>>>> fully pluggable (like compaction, compression, etc ...). Check > out > > >>> the > > >>>>>>> blog > > >>>>>>>> post [1] and documentation [2] Vinay wrote for more details, but > > the > > >>>>>>> short > > >>>>>>> > > >>>>>>> +1. I am still curious as to why you'd want to store audit log > > >>> entries > > >>>>>>> back in Cassandra? Depending on the scale it can generate a lot > of > > >>> load > > >>>>>> and > > >>>>>>> I think you'd end up in an infinite loop because as you're > > inserting > > >>>> the > > >>>>>>> audit log entry you'll generate a new one and so on unless you > > black > > >>>> list > > >>>>>>> audits to that table / keyspace. > > >>>>>>> > > >>>>>>> Ideally you'd insert this data into ElasticSearch / Solr or some > > >>> other > > >>>>>>> place that can be then used for analytics or search. > > >>>>>>> > > >>>>>>> Dinesh > > >>>>>>> > > --------------------------------------------------------------------- > > >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>> > > >>>> > > >>>> > --------------------------------------------------------------------- > > >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >>>> > > >>>> > > >>> > > >>> -- > > >>> Jon Haddad > > >>> > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=vyXA1unA3gpHGCpKOfRurmET3jOHaV2bjs1mHVVsb2U&s=EDg90XhABktX19m4FaDHKIjFaU2YAHbXjeEGk7Jx6dk&e= > > < > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=vyXA1unA3gpHGCpKOfRurmET3jOHaV2bjs1mHVVsb2U&s=EDg90XhABktX19m4FaDHKIjFaU2YAHbXjeEGk7Jx6dk&e= > > > > > >>> twitter: rustyrazorblade > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade