[
https://issues.apache.org/jira/browse/CASSANDRA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915309#action_12915309
]
Sylvain Lebresne commented on CASSANDRA-1546:
---------------------------------------------
Realized while answering you comment that I had forgot something, so I updated
the patch.
{quote}
Does the lead replica has to iterate all SSTables, and get the latest value of
th counter before applying the decr/incr mutation? If so, the read path can be
a performance bottleneck. But we can leverage some tricks: only the counter
columns in the latest SSTable are valid and others in the old SSTable can be
ignored safely.
So, the frequently updated counter column can resides in memtable, and local
read-modify-write operation only brings negligible performance lost. The
counter update path is almost as fast as normal column update path.
{quote}
During a write, after having apply the increment locally, there is a read a one
column (the one corresponding to the local count).
This is this value that is sent for replication (this thus integrate the
fleshly written update). This read is a normal read, so it hits as
many sstables as need be, if that's what you mean. But only one column is read.
One way to make this read fast is to use row cache on the counter CF. It is
true however that because of the marker columns, the
row may become fairly large with high volume counters (even though the row is
never read entirely). You can play on the ttl of
the marker column however to keep that manageable (the ttl on the marker can be
pretty small, in the order a minute or so). As said,
you can also not use marker column if you're ready to accept the potential
drawbacks, in which case the counter row will be really
small and a very good candidate for row cache. I don't know if that is what you
were proposing ?
Lastly, note that at CL.ONE and without marker column, the counter update path
will be as fast as normal column, as far as client are
concerned at least. Because on the leader replica we do write then read and
replicate.
{quote}
I have no idea about the detail of the removal before incr/decr problem. But a
quick solution could be let the deletion operation snapshots the current value
of counter column, write it in another column. Just let the read path to merge
these columns, including different counter columns, and the deletion snapshot
column.
{quote}
Ok, the problem is the following: suppose you issue one increment (+1), then
you remove the counter, then you increment again (+1).
Say the leader replicate is always the same one, but he receives the two
increments first. It will 'merge' those two increment, and
we'll end up with one column, whose count is 2 and whose timestamp is the one
of the last increment. Then it receives the delete.
But as far as he's concerned, this delete is obsolete and will be discarded.
Even if we were somehow able to detect that the delete
should have delete something, how can we know which parts of the now merged
count should be kept or not.
So basically, remove works if you don't reuse the counter afterwards :) Or
after a sufficient time has elapsed. Otherwise, it may
work or it may not :(
Even though this is really unfortunate, I don't see that as a blocker, since
people can always reset the counter by reading the value v
of the counter and then insert -v. Then I'm sure we can come up with something
smarter.
> (Yet another) approach to counting
> ----------------------------------
>
> Key: CASSANDRA-1546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1546
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Fix For: 0.7.0
>
> Attachments: 0001-Remove-IClock-from-internals.patch,
> 0002-Counters.patch, 0003-Generated-thrift-files-changes.patch
>
>
> This could be described as a mix between CASSANDRA-1072 without clocks and
> CASSANDRA-1421.
> More details in the comment below.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.