[jira] [Commented] (HBASE-14921) Memory optimizations

Anastasia Braginsky (JIRA) Thu, 21 Jul 2016 03:08:16 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387470#comment-15387470
 ]


Anastasia Braginsky commented on HBASE-14921:
---------------------------------------------

Thank you [~anoop.hbase] for your very reasonable comments!

bq. But when the use case is like some thing of time series data, where we 
really dont expect duplicates/updates, it might be better to turn off 
compaction and do only flatten.

Do you suggest to make an externally editable flag for turning compaction on 
and off? So what should be the default value for this flag? Didn’t we wanted 
sysadmins to work less with all those flags and settings (that we already 
have)? We can make this compaction-pre-check scan every second (Xth) flush to 
pipeline if it appears to decrease the performance.

bq. Again flatten to CellChunkMap would be ideal as that will release and 
reduce heap memory footprint for this memstore considerably. CellArrayMap, yes 
it reduces but not much. 

CellChunkMap is valuable because it can be taken off-heap, but CellChunkMap 
doesn’t significantly reduces the memory usage compared to CellArrayMap. All 
that you save memory-wise in CellChunkMap is that Cell object is now “embedded" 
as part of the array, and so you do not need the reference and the object 
overhead. So the difference between CellArrayMap and CellChunkMap is in 24 
bytes per Cell.

bq. In your usecase, the max adv you get because of the compaction as many 
cells will get removed.

I do not agree. In our experiments we (on purpose) use uniform distribution 
with small data size and we have little duplicates. We still see that the 
compaction has little impact on the performance.

bq. My another concern is regarding the fact that in this memstore only the 
tail of the pipeline getting flushed to disk when a flush request comes. In 1st 
version it was like always the compaction happens. So all chances that the tail 
of pipeline is much bigger sized and so that much data gets flushed. Now when 
compaction is not at all happening and we do have many small sized segments in 
pipeline, it would have been better to flush all the segments to disk that 
making small sized flushes. I raised this concern at first step also. But then 
the counter was that the compaction happens always but now it is not the case.

I remember this concern of yours from the code review. This is a valid concern 
and we are thinking about it. Apparently, this is one more reason to do 
compactions (at least for merge) once in a while. We can do it when we have 
like e.g. 10 segments in the pipeline. If we are going to simply flush it all 
to disk we are going to create many small files and their compaction is going 
to run on disk then...

bq. JFYI.. There is a periodic memstore flush checking. If we accumulate more 
than 30 million edits in memstore, we will flush

We know there is a flush to disk once about every hour. The main reason for 
that is WAL, right? Otherwise, why would we care how many cells are in memory? 
Actually, may be in this we do not want to flush absolutely everything to disk 
and to flush just the oldest part so the WAL can truncate a bit is enough?

> Memory optimizations
> --------------------
>
>                 Key: HBASE-14921
>                 URL: https://issues.apache.org/jira/browse/HBASE-14921
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Anastasia Braginsky
>         Attachments: CellBlocksSegmentInMemStore.pdf, 
> CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch, 
> HBASE-14921-V02.patch, HBASE-14921-V03.patch, HBASE-14921-V04-CA-V02.patch, 
> HBASE-14921-V04-CA.patch, HBASE-14921-V05-CAO.patch, 
> HBASE-14921-V06-CAO.patch, InitialCellArrayMapEvaluation.pdf, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Memory optimizations including compressed format representation and offheap 
> allocations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14921) Memory optimizations

Reply via email to