[ https://issues.apache.org/jira/browse/KUDU-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412895#comment-16412895 ]
Will Berkeley commented on KUDU-2376: ------------------------------------- Forgot to mention something very important: the client doing the writing needs to be a different one than the one that does the alter. > SIGSEGV while adding and dropping the same range partition and concurrently > writing > ----------------------------------------------------------------------------------- > > Key: KUDU-2376 > URL: https://issues.apache.org/jira/browse/KUDU-2376 > Project: Kudu > Issue Type: Bug > Affects Versions: 1.7.0 > Reporter: Will Berkeley > Priority: Major > Attachments: alter_table-test.patch > > > While adding a test to https://gerrit.cloudera.org/#/c/9393/, I ran into the > problem that writing while doing a replace tablet operation caused the client > to segfault. After inspecting the client code, it looked like the same > problem could occur if the same range partition was added and dropped with > concurrent writes. > Attached is a patch that adds a test to alter_table-test that reliably > reproduces the segmentation fault. > I don't totally understand what's happening, but here's what I think I have > figured out: > Suppose the range partition P=[0, 100) is dropped and re-added in a single > alter. This causes the tablet X for hash bucket 0 and range partition P to be > dropped, and a new one Y created for the same partition. There is a batch > pending to X which the client attempts to send to each of the replicas of X > in turn. Once the replicas are exhausted, the client attempts to find a new > leader with MetaCacheServerPicker::PickLeader, which triggers a master lookup > to get the latest consensus info for X (#5 in the big comment in PickLeader). > This calls LookupTabletByKey, which attempts a fast path lookup. Assuming > other metadata operations have already cached a tablet for Y, the tablet for > X will have been removed from the by-table-and-by-key map, and the fast > lookup with return an entry for Y. The client code doesn't know the > difference because the code paths just look at partition boundaries, which > match for X and Y. The lookup doesn't happen, and the client ends up in a > pretty tight loop repeating the above process, until the segfault. > I'm not sure exactly what the segmentation fault is. I looked at it a bit in > gdb and the segfault was a few calls deep into STL maps in release mode and > inside a refcount increment in debug mode. I'll try to attach some gdb output > showing that later. > The problem is also hinted at in a TODO in PickLeader: > {noformat} > // TODO: When we support tablet splits, we should let the lookup shift > // the write to another tablet (i.e. if it's since been split). > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)