Thanks for the explanation. It now makes a lot sense to me. This may deverve being added to the document so that new comers like me would understand the reason behind the scene.
Tim Armstrong writes: > There are various reasons - mainly we want more control over memory usage > and accounting than shared_ptr allows. > > Generally we avoid shared_ptr in Impala since it makes it harder to reason > about when resources are released. E.g. we typically want to know/control > exactly when memory is freed up. > > Using shared_ptr doesn't help with accounting memory accurately against > different plan nodes. E.g. if you have multiple join nodes in the same > pipeline, and each of them is processing a batch that references the same > disk io buffer, how do you attribute the memory? The most sensible approach > is to have the bottom-most node be the "owner" of the resource, then > transfer that ownership up by attaching it to the last batch that > references it. To do that we need to explicitly know which the last batch > is, so we have to explicitly track that anyway, which means that shared_ptr > doesn't really help us manage memory lifetime. > > I can see some advantages to tracking all the resources each batch > references them (e.g. having non-owning and owning references) - it would > make memory transfer issues easier to debug, but I don't think shared_ptr > helps with that accounting. > > > I think there may be some advantages to explicitly reference counting > resources for debugging memory issues. > > On Wed, Aug 31, 2016 at 5:06 AM, Amos Bird <[email protected]> wrote: > >> >> Hi there, >> >> I'm reading >> https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches. >> It says "If an operator is accumulating batches, this means that it must >> be careful not to destroy or reset a batch if previous batches are still >> in use, because this could release memory resources that are used by the >> previous batches." >> >> This seems to be a good place to use shared_ptr. I'm curious why impala >> handles this problem using some sort of coding conventions. Is it >> because we use MemPools? >> >> I may be very ignorance. Any explanation is highly appreciated! >> >> Regards, >> Amos >> >> >> >>
