Hi Anoop has clearly answered I believe. the short answer is in your CP it is better you copy/clone the cells so that there is no reference. I believe the Index related WAL codec in Phoenix was also trying to do something similar if I remember correctly. (I may be wrong though).
Regards Ram On Thu, Dec 3, 2020 at 9:30 AM Anoop John <[email protected]> wrote: > Hi Geoffrey, > > In case of off heap backed write path (RPC layer itself), the write payload > is accepted into DBBs that we get from a pool. And cells will be created > over this DBB. In case we add Tags in CPs, there will be a new Cell POJO > created. But that will anyways refer to old POJO for all parts except > Tags. See TagRewriteCell for eg: Anyways, when we add cells to Memsore, > then only we retrieve it from this RPC side buffer. In the write path, > once the call completes and comes back to RPC layer there we will release > the buffer. So there should not be a worry of a leak. > The only thing to be careful in CPs, is if you keep reference to Cells. In > such cases, it's advised to clone the cell (or parts of it) and keep that > reference. When RPC side we used pooled DBB, not doing this correctly can > cause the Cell being corrupted later. (The buffer would be released once > RPC call is over and later would be used to read some other write payload) > Even in case of on heap buffer usage at RPC, keeping such ref without clone > can cause issues as it will not allow the RPC payload read buffer (much > larger size than a cell size typically) to get GCed. Anyways I know Phoenix > Jira's aim is to create Cells with addition of tags , am saying it just as > a pointer. > > Anoop > > On Thu, Dec 3, 2020 at 2:41 AM Geoffrey Jacoby <[email protected]> wrote: > > > I'm code-reviewing a Phoenix PR [1] right now, which adds Tags to a > > mutation's Cells in a coproc. A question has come up regarding coprocs > and > > the optional off-heaping of the write path in HBase 2.x and up. > > > > For what parts of the write path (and hence, which coproc hooks) is it > safe > > to change the underlying Cells of a batch mutation without leaking > off-heap > > memory? > > > > The HBase book entry on off-heap writes [2] just discusses the ability to > > make the MemStore off-heap, but HBASE-15179 and its design doc[3] say > that > > the entire write stack is off-heap. > > > > Why this matters is if in a RegionObserver coproc hook (that's before the > > MemStore commit) the mutation Cells can be assumed to be on-heap, then > > clearing the internal family map of the mutation and replacing them with > > new, altered Cells is safe. (Extra GC pressure aside, of course.) If > not, I > > presume the coproc would be leaking off-heap memory (unless there's magic > > cleanup somewhere?) > > > > If this is not a safe assumption, what would the recommended way be to > > alter a Cell's Tags in a coproc, since Tags are explicitly not exposed to > > the HBase client, Cells are immutable, and hence the only way to do so > > would be to create new Cells in a coproc? My question's not how to create > > the new Cells (that's been answered elsewhere) but how to dispose of the > > old, original ones. > > > > Also, if this is not a safe assumption, is there an accepted LP(Coproc) > or > > Public API that a coproc can check to see if it's in an "off-heap" mode > or > > not so that a leak can be avoided? > > > > Thanks, > > > > Geoffrey Jacoby > > > > References: > > [1] https://github.com/apache/phoenix/pull/978 > > [2] https://hbase.apache.org/book.html#regionserver.offheap.writepath > > [3] > > > > > https://docs.google.com/document/d/1fj5P8JeutQ-Uadb29ChDscMuMaJqaMNRI86C4k5S1rQ/edit > > >
