On Thu, Mar 21, 2019 at 4:16 PM Atin Mukherjee <[email protected]> wrote:
> All, > > In the last few releases of glusterfs, with stability as a primary theme > of the releases, there has been lots of changes done on the code > optimization with an expectation that such changes will have gluster to > provide better performance. While many of these changes do help, but off > late we have started seeing some diverse effects of them, one especially > being the calloc to malloc conversions. While I do understand that malloc > syscall will eliminate the extra memset bottleneck which calloc bears, but > with recent kernels having in-built strong compiler optimizations I am not > sure whether that makes any significant difference, but as I mentioned > earlier certainly if this isn't done carefully it can potentially introduce > lot of bugs and I'm writing this email to share one of such experiences. > > Sanju & I were having troubles for last two days to figure out why > https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in > Sanju's system but it had no problems running the same fix in my gluster > containers. After spending a significant amount of time, what we now > figured out is that a malloc call [1] (which was a calloc earlier) is the > culprit here. As you all can see, in this function we allocate txn_id and > copy the event->txn_id into it through gf_uuid_copy () . But when we were > debugging this step wise through gdb, txn_id wasn't exactly copied with the > exact event->txn_id and it had some junk values which made the > glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on > resulting the leaks to remain the same which was the original intention of > the fix. > > This was quite painful to debug and we had to spend some time to figure > this out. Considering we have converted many such calls in past, I'd urge > that we review all such conversions and see if there're any side effects to > it. Otherwise we might end up running into many potential memory related > bugs later on. OTOH, going forward I'd request every patch > owners/maintainers to pay some special attention to these conversions and > see they are really beneficial and error free. IMO, general guideline > should be - for bigger buffers, malloc would make better sense but has to > be done carefully, for smaller size, we stick to calloc. > > What do others think about it? > I too am afraid of unknown effects of this change as much of the codebase relies on the assumption of zero-initialized data structures. I vote for reverting these patches unless it can be demonstrated that performance benefits are indeed significant. Otherwise the trade off in stability is not worth the cost. > > [1] > https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-op-sm.c#L5681 > _______________________________________________ > maintainers mailing list > [email protected] > https://lists.gluster.org/mailman/listinfo/maintainers >
_______________________________________________ maintainers mailing list [email protected] https://lists.gluster.org/mailman/listinfo/maintainers
