[ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764042#comment-15764042 ]
Anastasia Braginsky commented on HBASE-16421: --------------------------------------------- A collective answer to part of the issues raised after the road-map publishing: -------------------------------------------------------------------------------------------------------------- [~ram_krish]: bq. We may need the new type of Cell which has the chunk id in it? This is a possibility. We may have ChunkCell and HeapCell derived from Cell. What about putting the Chunk ID first integer on each chunk’s byte buffer? Then each cell that knows its offset and byte buffer can just read it from there and return. The Cell that has no underline MSLAB chunk can return -1 as Chunk ID. What do you think? bq. We have an internal branch which was doing the Pipeline flushing and creating n number of segments per snapshot. I could use that for now to test this. But if you need to test in latest trunk - can you prepare a patch with CellChunkMap and integrate it with the current trunk? I can give some patches on the #2 subtask for creating chunk id and having a cell with chunk id. Atleast from our earlier reports one thing is sure that we do create garbage during flush for the cell creation but the overall impact of GC was much better. So I think we are benefited there, but with the scan perf I think we have not done any tests. For now I can do it with our internal branch but not on latest trunk. It is OK that your evaluation will not be on the latest trunk what important is that chunks will be off-heap. To integrate CellChunkMap into the current trunk is all what need to be done in the task number 2, not a small issue, better not to do it as a prerequisite for the prerequisite. I think your patch should be good enough if it uses off-heap. When you say: “I can give some patches on the #2 subtask for creating chunk id and having a cell with chunk id” do you mean #2 among prerequisites or #2 among road-map tasks? I should actually number them anyhow different :) -------------------------------------------------------------------------------------------------------------- [~stack]: bq. Sorry... prob. w/ upserted cells is? Why would they not be allocated on MSLAB? Our last meeting we talked about cells upserted/updated by the append/increment operations, which are not allocated on MSLAB. Generally any cell (small enough to fit the regular chunk) that are not allocated on the MSLAB, although generally MSLAB is enabled. bq. Do we think these allocations long-lived? That they will migrate to permanent heap? The live length of those chunks depends on the live length of the cell for which this variable-size chunk is allocated. Under “permanent heap” do you mean the JVM’s non-heap Permanent Generation area? If so, then I do not think something allocated dynamically can ever move to permanent heap. It should be only for JVM’s metadata and statics. But may be I am missing something. -------------------------------------------------------------------------------------------------------------- [~anoop.hbase]: bq. A way to flush (to disk) chunk mapped segment directly with NO need to again make on heap Cell objects.. This is going to a big change I guess. The entire flush path work based on a scanner and that path need Cells. Generally I agree it would be better to flush without creating Cell objects. But if this is a critical item, then how all other scans performance should be? I mean, after all, flush uses the same scan as others. All those paths need Cells and after all the flush-scan is less frequent I think. If we generally think we need “A way to *scan* chunk mapped segment directly with NO need to again make on heap Cell objects”, then this is a big issue indeed. This is why we need scan evaluation and if the impact is big, we need to rethink the entire issue again. bq. Same way as above for the in memory compaction of 1+ chunk mapped segments. Please pay attention that we do not plan to do memory compaction (EAGER one) when CellChunkMap segments are used. CellChunkMap must go with MSLAB and In-Memory-Compaction must go without MSLAB... > Introducing the CellChunkMap as a new additional index variant in the MemStore > ------------------------------------------------------------------------------ > > Key: HBASE-16421 > URL: https://issues.apache.org/jira/browse/HBASE-16421 > Project: HBase > Issue Type: Umbrella > Reporter: Anastasia Braginsky > Attachments: CellChunkMapRevived.pdf, > IntroductiontoNewFlatandCompactMemStore.pdf > > > Follow up for HBASE-14921. This is going to be the umbrella JIRA to include > all the parts of integration of the CellChunkMap to the MemStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)