[ 
https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836000#comment-13836000
 ] 

Benedict edited comment on CASSANDRA-3578 at 12/1/13 11:20 AM:
---------------------------------------------------------------

bq. I understand what sync is doing
I just wanted to spell it all out so I could write next three times :-)

bq. Your own comments refer to "last" or "previous" in multiple places which 
speaks to how difficult it is to avoid thinking of it that way

I agree it's difficult to avoid talking about last when talking about next, but 
this is typical since it's chained. I still disagree about the rename, but as I 
alluded to do agree that both are justifiable positions. To me it is very 
unintuitive in the sync() method (which is the most complex method) to refer to 
the sync position we haven't yet populated as the "last" sync marker position. 
Also, in the isFullySynced(), I think both work - as if the next as past the 
end of the file, we're clearly referring to the next segment, no? I think we 
wouldn't be having this argument if I'd stuck with "header" instead of "marker" 
:-)

That all said, it's definitely not worth any more argument, so I'll bow out at 
this point if I still haven't persuaded you!

bq. couldn't we dispense with WaitQueue in favor of AtomicReference<Condition>
Well, WaitQueue is non-blocking, in a few senses:
# It is non-blocking in its *abstraction* which makes it absolutely the correct 
abstraction to use here. To use a Condition object would require considerably 
more ugliness than a WaitQueue, as we attempted to mimic its behaviour. 
However, you could implement a WaitQueue supporting only signalAll() using a 
Lock and an AtomicReference<Condition> under the hood.
# It is non-blocking in its implementation (well, almost; certainly in this use 
it is non-blocking with respect to consumers/producers blocking each other, but 
not necessarily consumers/consumers or producers/producers, although this is 
easily rectified in its current design). In this use case we could certainly 
handle blocking each other if we had to, but it is ugly to do so when it isn't 
necessary in my book.
# When signalling threads, all waiting threads are activated immediately, 
whereas with a Condition object they must wake up serially, incurring scheduler 
delay for each wake-up.

Something tangential also occurred to me, maybe worth spinning out a ticket 
for: As we currently stand we can only support max_concurrent_write TOTAL 
writes per batch with BatchCLE. If we were now to split out the CL.add() into 
the calling thread instead of the write stage, returning a Future to wait on in 
the write stage, we could support max_connection writes instead, which would be 
much larger. Although admittedly we would have to be careful about not OOMing 
if there are a lot of large writes outstanding. It could do a lot for 
throughput in BatchCLE potentially though.



was (Author: benedict):
bq. I understand what sync is doing
I just wanted to spell it all out so I could write next three times :-)

bq. Your own comments refer to "last" or "previous" in multiple places which 
speaks to how difficult it is to avoid thinking of it that way

I agree it's difficult to avoid talking about last when talking about next, but 
this is typical since it's chained. I still disagree about the rename, but as I 
alluded to do agree that both are justifiable positions. To me it is very 
unintuitive in the sync() method (which is the most complex method) to refer to 
the sync position we haven't yet populated as the "last" sync marker position. 
Also, in the isFullySynced(), I think both work - as if the next as past the 
end of the file, we're clearly referring to the next segment, no? I think we 
wouldn't be having this argument if I'd stuck with "header" instead of "marker" 
:-)

That all said, it's definitely not worth any more argument, so I'll bow out at 
this point if I still haven't persuaded you!

bq. couldn't we dispense with WaitQueue in favor of AtomicReference<Condition>
Well, WaitQueue is non-blocking, in two senses:
# It is non-blocking in its *abstraction* which makes it absolutely the correct 
abstraction to use here. To use a Condition object would require considerably 
more ugliness than a WaitQueue, as we attempted to mimic its behaviour. 
However, you could implement a WaitQueue supporting only signalAll() using a 
Lock and an AtomicReference<Condition> under the hood.
# It is non-blocking in its implementation (well, almost; certainly in this use 
it is non-blocking with respect to consumers/producers blocking each other, but 
not necessarily consumers/consumers or producers/producers, although this is 
easily rectified in its current design). In this use case we could certainly 
handle blocking each other if we had to, but it is ugly to do so when it isn't 
necessary in my book.



> Multithreaded commitlog
> -----------------------
>
>                 Key: CASSANDRA-3578
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Benedict
>            Priority: Minor
>              Labels: performance
>         Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, 
> Current-CL.png, Multi-Threded-CL.png, TestEA.java, latency.svg, oprate.svg, 
> parallel_commit_log_2.patch
>
>
> Brian Aker pointed out a while ago that allowing multiple threads to modify 
> the commitlog simultaneously (reserving space for each with a CAS first, the 
> way we do in the SlabAllocator.Region.allocate) can improve performance, 
> since you're not bottlenecking on a single thread to do all the copying and 
> CRC computation.
> Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes 
> doable.
> (moved from CASSANDRA-622, which was getting a bit muddled.)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to