Re: Questions about synchronization in compaction pipeline

Anastasia Braginsky Sun, 11 Mar 2018 13:27:59 -0700

 Hi Mike,
Thanks for being interested in the CompactionPipeline implementation. It is 
pleasure to discuss it with you.
Regarding that we are implementing our own copy-on-write (COW) list. May be it 
is close, but in classic COW, everybody is sharing the same read-only copy and 
when someone tries to write on this copy it gets its own/personal copy updated 
according to this write. This is not what happens in the pipeline. In pipeline 
we let everyone read the same read-only copy, because read accesses are more 
frequent. When rare update to the pipeline happens, it is synchronized on the 
pipeline itself (writable) and the the read-only copy is updated (quickly). So 
all this is done for a faster synchronization. Anyway I am not aware of some 
from-the-shelf Java list, giving me the same synchronization as I want. Please 
update me if I am wrong.
Regarding "I am concerned about the LL copy in pushHead - even if addFirst is 
faster, a LL copy is fairly slow and likely loses us any gains". As you can 
see, recreation of the read-only-copy happens anytime the background pipeline 
changes (addFirst, swap, replaceAtIndex), which are rare operations happening 
on snapshot, compaction, flattening, respectively. The copy of the segment 
after all is the copy of the references without copying the entire data itself. 
We had previous type of synchronization before (without read-only-copy) and it 
was slower. So if you believe, read-only-copy creation is a key for some 
performance problem, please give provide any measurements.
Regarding "Also, I'm a little dubious on the use of LL given that we support a 
replaceAtIndex which will be much faster in an array". Generally I agree that 
change the implementation of "readOnlyCopy" from LinkedList to ArrayList, might 
be beneficial here. Specially for the replaceAtIndex case. I don't see how 
ArrayDeque helps us.
Thanks,Anastasia
    On Sunday, March 11, 2018, 8:06:05 AM GMT+2, 张铎(Duo Zhang) 
<[email protected]> wrote:  
 
 I believe the comments there are mainly about concurrency problem, not for
linked list vs. array list, at least for me...


2018-03-11 4:12 GMT+08:00 Mike Drob <[email protected]>:

> Hi devs,
>
> I was reading through HBASE-17434 trying to understand why we have two
> linked lists in compaction pipeline and I'm having trouble following the
> conversation there, especially since it seems intertwined with HBASE-17379
> and jumps back and forth a few times.
>
> It looks like we are implementing our own copy-on-write list, and there is
> a claim that addFirst is faster on a LinkedList than an array based list. I
> am concerned about the LL copy in pushHead - even if addFirst is faster, a
> LL copy is fairly slow and likely loses us any gains. Also, I'm a little
> dubious on the use of LL given that we support a replaceAtIndex which will
> be much faster in an array.
>
> Can we improve by using an ArrayDeque?
>
> Eschar, Anastasia, WDYT?
>
> Thanks,
> Mike
>
> Some observations about performance -
> https://stuartmarks.wordpress.com/2015/12/18/some-java-list-benchmarks/
>

Re: Questions about synchronization in compaction pipeline

Reply via email to