[
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764042#comment-15764042
]
Anastasia Braginsky commented on HBASE-16421:
---------------------------------------------
A collective answer to part of the issues raised after the road-map publishing:
--------------------------------------------------------------------------------------------------------------
[~ram_krish]:
bq. We may need the new type of Cell which has the chunk id in it?
This is a possibility. We may have ChunkCell and HeapCell derived from Cell.
What about putting the Chunk ID first integer on each chunk’s byte buffer? Then
each cell that knows its offset and byte buffer can just read it from there and
return. The Cell that has no underline MSLAB chunk can return -1 as Chunk ID.
What do you think?
bq. We have an internal branch which was doing the Pipeline flushing and
creating n number of segments per snapshot. I could use that for now to test
this. But if you need to test in latest trunk - can you prepare a patch with
CellChunkMap and integrate it with the current trunk? I can give some patches
on the #2 subtask for creating chunk id and having a cell with chunk id.
Atleast from our earlier reports one thing is sure that we do create garbage
during flush for the cell creation but the overall impact of GC was much
better. So I think we are benefited there, but with the scan perf I think we
have not done any tests. For now I can do it with our internal branch but not
on latest trunk.
It is OK that your evaluation will not be on the latest trunk what important is
that chunks will be off-heap. To integrate CellChunkMap into the current trunk
is all what need to be done in the task number 2, not a small issue, better not
to do it as a prerequisite for the prerequisite. I think your patch should be
good enough if it uses off-heap. When you say: “I can give some patches on the
#2 subtask for creating chunk id and having a cell with chunk id” do you mean
#2 among prerequisites or #2 among road-map tasks? I should actually number
them anyhow different :)
--------------------------------------------------------------------------------------------------------------
[~stack]:
bq. Sorry... prob. w/ upserted cells is? Why would they not be allocated on
MSLAB?
Our last meeting we talked about cells upserted/updated by the append/increment
operations, which are not allocated on MSLAB. Generally any cell (small enough
to fit the regular chunk) that are not allocated on the MSLAB, although
generally MSLAB is enabled.
bq. Do we think these allocations long-lived? That they will migrate to
permanent heap?
The live length of those chunks depends on the live length of the cell for
which this variable-size chunk is allocated. Under “permanent heap” do you mean
the JVM’s non-heap Permanent Generation area? If so, then I do not think
something allocated dynamically can ever move to permanent heap. It should be
only for JVM’s metadata and statics. But may be I am missing something.
--------------------------------------------------------------------------------------------------------------
[~anoop.hbase]:
bq. A way to flush (to disk) chunk mapped segment directly with NO need to
again make on heap Cell objects.. This is going to a big change I guess. The
entire flush path work based on a scanner and that path need Cells.
Generally I agree it would be better to flush without creating Cell objects.
But if this is a critical item, then how all other scans performance should be?
I mean, after all, flush uses the same scan as others. All those paths need
Cells and after all the flush-scan is less frequent I think. If we generally
think we need “A way to *scan* chunk mapped segment directly with NO need to
again make on heap Cell objects”, then this is a big issue indeed. This is why
we need scan evaluation and if the impact is big, we need to rethink the entire
issue again.
bq. Same way as above for the in memory compaction of 1+ chunk mapped segments.
Please pay attention that we do not plan to do memory compaction (EAGER one)
when CellChunkMap segments are used. CellChunkMap must go with MSLAB and
In-Memory-Compaction must go without MSLAB...
> Introducing the CellChunkMap as a new additional index variant in the MemStore
> ------------------------------------------------------------------------------
>
> Key: HBASE-16421
> URL: https://issues.apache.org/jira/browse/HBASE-16421
> Project: HBase
> Issue Type: Umbrella
> Reporter: Anastasia Braginsky
> Attachments: CellChunkMapRevived.pdf,
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include
> all the parts of integration of the CellChunkMap to the MemStore.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)