Hi Geoffrey, In case of off heap backed write path (RPC layer itself), the write payload is accepted into DBBs that we get from a pool. And cells will be created over this DBB. In case we add Tags in CPs, there will be a new Cell POJO created. But that will anyways refer to old POJO for all parts except Tags. See TagRewriteCell for eg: Anyways, when we add cells to Memsore, then only we retrieve it from this RPC side buffer. In the write path, once the call completes and comes back to RPC layer there we will release the buffer. So there should not be a worry of a leak. The only thing to be careful in CPs, is if you keep reference to Cells. In such cases, it's advised to clone the cell (or parts of it) and keep that reference. When RPC side we used pooled DBB, not doing this correctly can cause the Cell being corrupted later. (The buffer would be released once RPC call is over and later would be used to read some other write payload) Even in case of on heap buffer usage at RPC, keeping such ref without clone can cause issues as it will not allow the RPC payload read buffer (much larger size than a cell size typically) to get GCed. Anyways I know Phoenix Jira's aim is to create Cells with addition of tags , am saying it just as a pointer.
Anoop On Thu, Dec 3, 2020 at 2:41 AM Geoffrey Jacoby <[email protected]> wrote: > I'm code-reviewing a Phoenix PR [1] right now, which adds Tags to a > mutation's Cells in a coproc. A question has come up regarding coprocs and > the optional off-heaping of the write path in HBase 2.x and up. > > For what parts of the write path (and hence, which coproc hooks) is it safe > to change the underlying Cells of a batch mutation without leaking off-heap > memory? > > The HBase book entry on off-heap writes [2] just discusses the ability to > make the MemStore off-heap, but HBASE-15179 and its design doc[3] say that > the entire write stack is off-heap. > > Why this matters is if in a RegionObserver coproc hook (that's before the > MemStore commit) the mutation Cells can be assumed to be on-heap, then > clearing the internal family map of the mutation and replacing them with > new, altered Cells is safe. (Extra GC pressure aside, of course.) If not, I > presume the coproc would be leaking off-heap memory (unless there's magic > cleanup somewhere?) > > If this is not a safe assumption, what would the recommended way be to > alter a Cell's Tags in a coproc, since Tags are explicitly not exposed to > the HBase client, Cells are immutable, and hence the only way to do so > would be to create new Cells in a coproc? My question's not how to create > the new Cells (that's been answered elsewhere) but how to dispose of the > old, original ones. > > Also, if this is not a safe assumption, is there an accepted LP(Coproc) or > Public API that a coproc can check to see if it's in an "off-heap" mode or > not so that a leak can be avoided? > > Thanks, > > Geoffrey Jacoby > > References: > [1] https://github.com/apache/phoenix/pull/978 > [2] https://hbase.apache.org/book.html#regionserver.offheap.writepath > [3] > > https://docs.google.com/document/d/1fj5P8JeutQ-Uadb29ChDscMuMaJqaMNRI86C4k5S1rQ/edit >
