Will Berkeley created KUDU-2376:
-----------------------------------

             Summary: SIGSEGV while adding and dropping the same range 
partition and concurrently writing
                 Key: KUDU-2376
                 URL: https://issues.apache.org/jira/browse/KUDU-2376
             Project: Kudu
          Issue Type: Bug
    Affects Versions: 1.7.0
            Reporter: Will Berkeley
         Attachments: alter_table-test.patch

While adding a test to https://gerrit.cloudera.org/#/c/9393/, I ran into the 
problem that writing while doing a replace tablet operation caused the client 
to segfault. After inspecting the client code, it looked like the same problem 
could occur if the same range partition was added and dropped with concurrent 
writes.

Attached is a patch that adds a test to alter_table-test that reliably 
reproduces the segmentation fault.

I don't totally understand what's happening, but here's what I think I have 
figured out:

Suppose the range partition P=[0, 100) is dropped and re-added in a single 
alter with a batch. This causes the tablet X for hash bucket 0 and range 
partition P to be dropped, and a new one Y created for the same partition. 
There is a batch pending to X which the client attempts to send to each of the 
replicas of X in turn. Once the replicas are exhausted, the client attempts to 
find a new leader with MetaCacheServerPicker::PickLeader, which triggers a 
master lookup to get the latest consensus info for X (#5 in the big comment in 
PickLeader). This calls LookupTabletByKey, which attempts a fast path lookup. 
Assuming other metadata operations have already cached a tablet for Y, the 
tablet for X will have been removed from the by-table-and-by-key map, and the 
fast lookup with return an entry for Y. The client code doesn't know the 
difference because the code paths just look at partition boundaries, which 
match for X and Y. The lookup doesn't happen, and the client ends up in a 
pretty tight loop looking repeating the above process, until the segfault.

I'm not sure exactly what the segmentation fault is. I looked at it a bit in 
gdb and the segfault was a few calls deep into STL maps in release mode and 
inside a refcount increment in debug mode. I'll try to attach some gdb output 
showing that later.

The problem is also hinted at in a TODO in PickLeader:
{noformat}
// TODO: When we support tablet splits, we should let the lookup shift
// the write to another tablet (i.e. if it's since been split).
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to