[
https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243555#comment-15243555
]
Joshua McKenzie commented on CASSANDRA-8844:
--------------------------------------------
bq. I was a fan of the ReplayPosition name. It stands for a more general
concept which happens be the commit log position for us. Further to this, it
should be a CommitLogPosition rather than ..SegmentPosition as it does not just
specify a position within a given segment but an overall position in the log
(for a specific keyspace). I am also wondering if it should not include a
keyspace id / reference now that it is keyspace-specific to be able to fail
fast on mismatch.
I appreciate the feedback here on naming but I disagree on both counts. In
"ReplayPosition" vs. "CommitLogSegmentPosition", the former couples the name
with an intended usage / implementation whereas the latter is strictly a
statement of what the object is without usage context. Regarding
CommitLogPosition vs. CommitLogSegmentPosition, the class itself contains 2
instance variables: a segmentId and a position. Again, calling it a
CommitLogPosition would couple the name of the class with an intended usage
rather than leaving it modularly decoupled in my opinion.
As for adding a keyspace id / reference and failing fast, what immediate
use-case / optimization do you have in mind where that would help us? Replay
should be limited to files in directories and a user of the CommitLogReader
that's working with reading CDC logs should really have an all-or-nothing
perspective on the keyspaces in the logs they're parsing, I believe.
bq. I'd prefer to throw the WriteTimeoutException directly from allocate
(instead of catching null in CommitLog and doing the same). Doing the check
inside the while loop will avoid the over-allocation and do less work in the
common case.
Changed.
bq. Do we really need to have separate buffer pools per manager? Static (or
not) shared will offer slightly better cache locality, and it's better to block
both commit logs if we're running beyond allowed memory (we may want to double
the default limit).
I originally changed this code due to
CommitLogSegmentManagerTest.testCompressedCommitLogBackpressure failing since,
upon raising the limit to 6, the standard CLSM was "stealing" one of the
allotted buffers from the extra 3. What I didn't really take into account was
the fact that, given the AbstractCommitLogService is now using a
CommitLog.sync() that essentially does a sequential sync across all CLSM, a
delay in any of the CLSM's will lead to a delay in all of them, so having them
operate with independent buffers doesn't make any difference.
Made the pool static and upped max to 6. I prefer having this pool discrete
rather than embedded in FileDirectSegment.
bq. segmentManagers array: An EnumMap (which boils down to the same thing)
would be cleaner and should not have any performance impact.
Changed. Much preferred - thanks for the heads up.
bq. shutdownBlocking: Better shutdown in parallel, i.e. initiate and await
termination separately.
Agreed. Changed.
{quote}reCalculating cas in maybeUpdateCDCSizeCounterAsync is fishy: makes you
think it would clear on exception in running update, which isn't the case. The
updateCDCDirectorySize body should be wrapped in try ... finally as well to do
that.
You could use a scheduled executor to avoid the explicit delays. Or a
RateLimiter (we'd prefer to update ASAP when triggered, but not too often)
instead of the delay.
updateCDCOverflowSize: use while (!reCalculating.compareAndSet(false, true))
{};. You should reset the value afterwards.
CDCSizeCalculator.calculateSize should return the size, and maybe made
synchronized for a bit of additional safety.
{quote}
Changed to RateLimiter, tossed the premature optimization of the atomic bool
protection around runnables that are going to get discarded (should all be eden
and small), and moved the scheduling code and refactored a bit into
CDCSizeCalculator. The class as a whole and flow are much cleaner now IMO - the
above points should either be addressed or no longer apply after the change.
Let me know what you think.
bq. I don't get the DirectorySizeCalculator. Why the alive and visited sets,
the listFiles step? Either list the files and just loop through them, or do the
walkFileTree operation – you are now doing the same work twice. Use a plain
long instead of the atomic as the class is still thread-unsafe.
This class is actually a straight up refactor / extraction of
{{Directories.TrueFilesSizeVisitor}} on trunk. I don't doubt this class could
use some work (code's from CASSANDRA-6231 back in 2013) but I'd prefer to
handle that as a follow-up ticket.
bq. Scrubber change should be reverted.
Thanks. intellij idea got over-zealous on a refactor/rename and I thought I'd
tracked all of those down.
bq. "Permissible" changed to "permissable" at some places in the code; the
latter is a misspelling.
Fixed.
Rebased to current trunk for good measure. Going to continue working through
feedback in chronological order come Monday.
Thanks again for the work on feedback thus far.
> Change Data Capture (CDC)
> -------------------------
>
> Key: CASSANDRA-8844
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8844
> Project: Cassandra
> Issue Type: New Feature
> Components: Coordination, Local Write-Read Paths
> Reporter: Tupshin Harper
> Assignee: Joshua McKenzie
> Priority: Critical
> Fix For: 3.x
>
>
> "In databases, change data capture (CDC) is a set of software design patterns
> used to determine (and track) the data that has changed so that action can be
> taken using the changed data. Also, Change data capture (CDC) is an approach
> to data integration that is based on the identification, capture and delivery
> of the changes made to enterprise data sources."
> -Wikipedia
> As Cassandra is increasingly being used as the Source of Record (SoR) for
> mission critical data in large enterprises, it is increasingly being called
> upon to act as the central hub of traffic and data flow to other systems. In
> order to try to address the general need, we (cc [~brianmhess]), propose
> implementing a simple data logging mechanism to enable per-table CDC patterns.
> h2. The goals:
> # Use CQL as the primary ingestion mechanism, in order to leverage its
> Consistency Level semantics, and in order to treat it as the single
> reliable/durable SoR for the data.
> # To provide a mechanism for implementing good and reliable
> (deliver-at-least-once with possible mechanisms for deliver-exactly-once )
> continuous semi-realtime feeds of mutations going into a Cassandra cluster.
> # To eliminate the developmental and operational burden of users so that they
> don't have to do dual writes to other systems.
> # For users that are currently doing batch export from a Cassandra system,
> give them the opportunity to make that realtime with a minimum of coding.
> h2. The mechanism:
> We propose a durable logging mechanism that functions similar to a commitlog,
> with the following nuances:
> - Takes place on every node, not just the coordinator, so RF number of copies
> are logged.
> - Separate log per table.
> - Per-table configuration. Only tables that are specified as CDC_LOG would do
> any logging.
> - Per DC. We are trying to keep the complexity to a minimum to make this an
> easy enhancement, but most likely use cases would prefer to only implement
> CDC logging in one (or a subset) of the DCs that are being replicated to
> - In the critical path of ConsistencyLevel acknowledgment. Just as with the
> commitlog, failure to write to the CDC log should fail that node's write. If
> that means the requested consistency level was not met, then clients *should*
> experience UnavailableExceptions.
> - Be written in a Row-centric manner such that it is easy for consumers to
> reconstitute rows atomically.
> - Written in a simple format designed to be consumed *directly* by daemons
> written in non JVM languages
> h2. Nice-to-haves
> I strongly suspect that the following features will be asked for, but I also
> believe that they can be deferred for a subsequent release, and to guage
> actual interest.
> - Multiple logs per table. This would make it easy to have multiple
> "subscribers" to a single table's changes. A workaround would be to create a
> forking daemon listener, but that's not a great answer.
> - Log filtering. Being able to apply filters, including UDF-based filters
> would make Casandra a much more versatile feeder into other systems, and
> again, reduce complexity that would otherwise need to be built into the
> daemons.
> h2. Format and Consumption
> - Cassandra would only write to the CDC log, and never delete from it.
> - Cleaning up consumed logfiles would be the client daemon's responibility
> - Logfile size should probably be configurable.
> - Logfiles should be named with a predictable naming schema, making it
> triivial to process them in order.
> - Daemons should be able to checkpoint their work, and resume from where they
> left off. This means they would have to leave some file artifact in the CDC
> log's directory.
> - A sophisticated daemon should be able to be written that could
> -- Catch up, in written-order, even when it is multiple logfiles behind in
> processing
> -- Be able to continuously "tail" the most recent logfile and get
> low-latency(ms?) access to the data as it is written.
> h2. Alternate approach
> In order to make consuming a change log easy and efficient to do with low
> latency, the following could supplement the approach outlined above
> - Instead of writing to a logfile, by default, Cassandra could expose a
> socket for a daemon to connect to, and from which it could pull each row.
> - Cassandra would have a limited buffer for storing rows, should the listener
> become backlogged, but it would immediately spill to disk in that case, never
> incurring large in-memory costs.
> h2. Additional consumption possibility
> With all of the above, still relevant:
> - instead (or in addition to) using the other logging mechanisms, use CQL
> transport itself as a logger.
> - Extend the CQL protoocol slightly so that rows of data can be return to a
> listener that didn't explicit make a query, but instead registered itself
> with Cassandra as a listener for a particular event type, and in this case,
> the event type would be anything that would otherwise go to a CDC log.
> - If there is no listener for the event type associated with that log, or if
> that listener gets backlogged, the rows will again spill to the persistent
> storage.
> h2. Possible Syntax
> {code:sql}
> CREATE TABLE ... WITH CDC LOG
> {code}
> Pros: No syntax extesions
> Cons: doesn't make it easy to capture the various permutations (i'm happy to
> be proven wrong) of per-dc logging. also, the hypothetical multiple logs per
> table would break this
> {code:sql}
> CREATE CDC_LOG mylog ON mytable WHERE MyUdf(mycol1, mycol2) = 5 with
> DCs={'dc1','dc3'}
> {code}
> Pros: Expressive and allows for easy DDL management of all aspects of CDC
> Cons: Syntax additions. Added complexity, partly for features that might not
> be implemented
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)