Thanks for being interested in the CompactionPipeline implementation. It is
pleasure to discuss it with you.
Regarding that we are implementing our own copy-on-write (COW) list. May be it
is close, but in classic COW, everybody is sharing the same read-only copy and
when someone tries to write on this copy it gets its own/personal copy updated
according to this write. This is not what happens in the pipeline. In pipeline
we let everyone read the same read-only copy, because read accesses are more
frequent. When rare update to the pipeline happens, it is synchronized on the
pipeline itself (writable) and the the read-only copy is updated (quickly). So
all this is done for a faster synchronization. Anyway I am not aware of some
from-the-shelf Java list, giving me the same synchronization as I want. Please
update me if I am wrong.
Regarding "I am concerned about the LL copy in pushHead - even if addFirst is
faster, a LL copy is fairly slow and likely loses us any gains". As you can
see, recreation of the read-only-copy happens anytime the background pipeline
changes (addFirst, swap, replaceAtIndex), which are rare operations happening
on snapshot, compaction, flattening, respectively. The copy of the segment
after all is the copy of the references without copying the entire data itself.
We had previous type of synchronization before (without read-only-copy) and it
was slower. So if you believe, read-only-copy creation is a key for some
performance problem, please give provide any measurements.
Regarding "Also, I'm a little dubious on the use of LL given that we support a
replaceAtIndex which will be much faster in an array". Generally I agree that
change the implementation of "readOnlyCopy" from LinkedList to ArrayList, might
be beneficial here. Specially for the replaceAtIndex case. I don't see how
ArrayDeque helps us.
On Sunday, March 11, 2018, 8:06:05 AM GMT+2, 张铎(Duo Zhang)
I believe the comments there are mainly about concurrency problem, not for
linked list vs. array list, at least for me...
2018-03-11 4:12 GMT+08:00 Mike Drob <mad...@cloudera.com>:
> Hi devs,
> I was reading through HBASE-17434 trying to understand why we have two
> linked lists in compaction pipeline and I'm having trouble following the
> conversation there, especially since it seems intertwined with HBASE-17379
> and jumps back and forth a few times.
> It looks like we are implementing our own copy-on-write list, and there is
> a claim that addFirst is faster on a LinkedList than an array based list. I
> am concerned about the LL copy in pushHead - even if addFirst is faster, a
> LL copy is fairly slow and likely loses us any gains. Also, I'm a little
> dubious on the use of LL given that we support a replaceAtIndex which will
> be much faster in an array.
> Can we improve by using an ArrayDeque?
> Eschar, Anastasia, WDYT?
> Some observations about performance -