Re: Checking what a BatchWriter is stuck on; failure during split

Josh Elser Fri, 22 Apr 2016 07:28:48 -0700

Awesome hunting and write-up, Dylan!

Dylan Hutchison wrote:

Hi all,


Check out ACCUMULO-4229
<https://issues.apache.org/jira/browse/ACCUMULO-4229>.  I copy the text
below to include it in our discussion.

This issue came up again during a similar long-lasting BatchWrite.  It had
the same circumstances: the failures started happening after a split event.

I turned on TRACE logs and I think I pinned it down: the TabletLocator
cached by a BatchWriter gets out of sync with the static cache of
TabletLocators.

    1. The TabletServerBatchWriter caches a TabletLocator from the static
    collection of TabletLocators when it starts writing.  Suppose it is writing
    to tablet T1.
    2. The TabletServerBatchWriter uses its locally cached TabletLocator
    inside its `binMutations` method for its entire lifespan; this cache is
    never refreshed or updated to sync up with the static collection of
    TabletLocators.
    3. Every hour, the static collection of TabletLocators clears itself.
    The next call to get a TabletLocator from the static collection allocates a
    new TabletLocator.  Unfortunately, the TabletServerBatchWriter does not
    reflect this change and continues to use the old, locally cached
    TabletLocator.
    4. Tablet T1 splits into T2 and T3, which closes T1.  As such, it no
    longer exists and the tablet server that receives the entries meant to go
    to T1 all fail to write because T1 is closed.
    5. The TabletServerBatchWriter receives the response from the tablet
    server that all entries failed to write.  It invalidates the cache of the
    *new* TabletLocator obtained from the static collection of TabletLocators.
    The old TabletLocator that is cached locally does not get invalidated.
    6. The TabletServerBatchWriter re-queues the failed entries and tries to
    write them to the same closed tablet T1, because it is still looking up
    tablets using the old TabletLocator.

This behavior subsumes the circumstances William wrote about in the thread
<https://mail-archives.apache.org/mod_mbox/accumulo-user/201406.mbox/%3ccamz+duvmmhegon9ejehr9h_rrpp50l2qz53bbdruvo0pira...@mail.gmail.com%3E>
he mentioned.  The problem would occur as a result of either splits or
major compactions.  It would only stop the BatchWriter if its entire memory
filled up with writes to the same tablet that was closed as a result of a
majc or split; otherwise it would just slow down the BatchWriter by failing
to write some number of entries with every RPC.

There are a few solutions we can think of.

    1. Not have the MutationWriter inside the TabletServerBatchWriter
    locally cache TabletLocators.  I suspect this was done for performance
    reasons, so it's probably not a good solution.
    2. Have all the MutationWriters clear their cache at the same time the
    static TabletLocator cache clears.  I like this one.  We could store a
    reference to the Map that each MutationWriter has inside a static
    synchronized WeakHashMap.  The only time the weak map needs to be accessed
    is:
       1. When a MutationWriter is constructed (from constructing a
       TabletServerBatchWriter), add its new local TabletLocator cache
to the weak
       map.
       2. When the static TabletLocator cache is cleared, also clear every
       map in the weak map.
    3. Another solution is to make the invalidate calls on the local
    TabletLocator cache rather than the global static one.  If we go this route
    we should double check the idea to make sure it does not impact the
    correctness of any other pieces of code that use the cache. I like the
    previous idea better.

The TimeoutTabletLocator does not help when no timeout is set on the
BatchWriter (the default behavior).


On Tue, Apr 19, 2016 at 8:09 PM, William Slacum<[email protected]>  wrote:

Good digs, Dylan. I don't think it's too rare to matter. I notice  often
during MR jobs, and there's usually a point where I give up and just start
writing RFiles.

It could possibly be related to what I saw back in the dayoday with:

https://mail-archives.apache.org/mod_mbox/accumulo-user/201406.mbox/%3ccamz+duvmmhegon9ejehr9h_rrpp50l2qz53bbdruvo0pira...@mail.gmail.com%3E

On Tue, Apr 19, 2016 at 6:26 PM, Josh Elser<[email protected]>  wrote:

Nice findings. Sorry I haven't had any cycles to dig into this myself.

I look forward to hearing what you find :)


Dylan Hutchison wrote:

I investigated a bit more and I am pretty sure the problem is that the
BatchWriter is not recognizing that the tablet vb<<   split into vb;2436<
and
vb<;2436.  It keeps trying to update the closed tablet vb<<.  Each

update

writes 0 mutations and records a failure at the tablet server
UpdateSession
because vb<<   is closed.

I'm not sure why this is happening because the BatchWriter should have
invalidated its tablet locator cache upon recognizing a failure.  Then

it

would recognize that the entries it wants to write fall into the new
tablets vb;2436<   and vb<;2436.  I think there is a timing bug for this
edge
case, when a table split occurs during heavy writes.

I will write this up if I can reproduce it.  Maybe it is too rare to
matter.

Cheers, Dylan

On Mon, Apr 18, 2016 at 2:38 PM, Dylan Hutchison<
[email protected]

wrote:

Hi devs,

I'd like to ask your help in figuring out what is happening to a
BatchWriter.  The following gives my reasoning so far.

In Accumulo 1.7.1, I have a BatchWriter that is stuck in WAITING status
in
its addMutation method.  I saw that it is stuck by jstack'ing the
Accumulo
client.  It's been stuck like this for 16 hours.

The BatchWriter is supposed to wait when a mutation is added if no
failures have recorded and either (a) the total memory used exceeds the
maximum allowed for the BatchWriter, or (b) the batchwriter is

currently

flushed.  So we conclude that one of (a) or (b) have occurred and no
failures were recorded, at the time when addMutation was called.  I

think

(a) is likely.

The BatchWriter is supposed to notify itself when either (1) a flush
finishes, (2) a constraint violation or authorization failure or server
error or unknown error occurs, (3) memory usage decreases, which

happens

when entries successfully send to the tablet server.  Since the
BatchWriter
is stuck on WAITING, none of these conditions are occurring.

The BatchWriter has 3 write threads (the default number).  All three

have

status TIMED_WAITING (parked) in jstack.  Their stack traces don't give
useful information.

Here's what I can tell from the tserver logs.  A new table (and tablet)
was created successfully.  The BatchWriter started writing to this

tablet

steadily.  The logs show that the tablet (vb<<) flushed every 5 seconds
or
so and major compacted at a steady periodic rate.

Everything looks good, until vb<<   grew large enough that it needed
splitting.  This occurred about 42 minutes after the BatchWriter

started

writing entries.  The logs show a failure in an UpdateSession that

popped

up in the middle of the split operation.  This failure continues to

show

for the next 15 hours.

I copied the portion of the tserver logs that look relevant to the

split

below.  I highlighted the line reporting the first failure.  It occurs

in

between when the split starts and when it finishes.

Any idea what could have caused this?  I don't know if the failure is
related to the BatchWriter being stuck in WAITING.  It seems likely.  I
think it is weird that the 3 write threads are all idle; at least one

of

them should be doing something if the thread calling addMutation() is
waiting.

Here is a pastebin of the jstack<http://pastebin.com/TLfdrDwX>,

though I

think I wrote the useful parts from it.

2016-04-17 22:38:06,436 [tablet.Tablet] TABLET_HIST: vb<<   closed
2016-04-17 22:38:06,439 [tablet.Tablet] DEBUG: Files for low split
vb;2436<
   [hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00008lh.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00009iz.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a08.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a4t.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a7m.rf,
hdfs:
//localhost:9000/accumulo/tables/vb/default_tablet/C0000a8f.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8n.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8p.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8q.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a93.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a9b.rf,
hdfs://localhost:90
00/accumulo/tables/vb/default_tablet/M0000a9g.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aa7.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aap.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abe.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abn.rf]
2016-04-17 22:38:06,439 [tablet.Tablet] DEBUG: Files for high split
vb<;2436
   [hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00008lh.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00009iz.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a08.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a4t.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a7m.rf,

hdfs

://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8f.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8n.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8p.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8q.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a93.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a9b.rf,
hdfs://localhost:9
000/accumulo/tables/vb/default_tablet/M0000a9g.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aa7.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aap.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abe.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abn.rf]
2016-04-17 22:38:06,440 [tserver.TabletServer] DEBUG: ScanSess tid
127.0.0.1:42302 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1]
2016-04-17 22:38:06,472 [tablet.Tablet] TABLET_HIST: vb<<   split

vb;2436<

vb<;2436
2016-04-17 22:38:06,472 [tablet.Tablet] DEBUG: offline split time :
  3.22
secs
*2016-04-17 22:38:06,472 [tserver.TabletServer] INFO : Starting split:
vb<<*
2016-04-17 22:38:06,472 [tserver.TabletServer] DEBUG: UpSess
127.0.0.1:41090 0 in 3.059s, at=[0 0 0.00 1] ft=3.010s(pt=3.010s
lt=0.000s ct=0.000s)
2016-04-17 22:38:06,472 [tserver.NativeMap] DEBUG: Allocated native map
0x0000000009a20e60

*2016-04-17 22:38:06,473 [tserver.TabletServer] DEBUG: Failures: 1,

first

extent vb<<   successful commits: 0*

2016-04-17 22:38:06,473 [tablet.Tablet] DEBUG: Reloading constraints

for

extent: vb;2436<
2016-04-17 22:38:06,473 [constraints.ConstraintChecker] DEBUG: Loaded
constraint

org.apache.accumulo.core.constraints.DefaultKeySizeConstraint

for vb
2016-04-17 22:38:06,474 [tablet.Tablet] TABLET_HIST: vb;2436<   opened
2016-04-17 22:38:06,474 [tserver.NativeMap] DEBUG: Allocated native map
0x0000000003edf090
2016-04-17 22:38:06,474 [tablet.Tablet] DEBUG: Reloading constraints

for

extent: vb<;2436
2016-04-17 22:38:06,474 [constraints.ConstraintChecker] DEBUG: Loaded
constraint

org.apache.accumulo.core.constraints.DefaultKeySizeConstraint

for vb
2016-04-17 22:38:06,475 [tablet.Tablet] TABLET_HIST: vb<;2436 opened
*2016-04-17 22:38:06,475 [tserver.TabletServer] INFO : Tablet split:

vb<<

size0 539900144 size1 539900148 time 8268ms*
2016-04-17 22:38:06,594 [tserver.TabletServer] DEBUG: MultiScanSess
127.0.0.1:51092 0 entries in 0.00 secs (lookup_time:0.00 secs

tablets:1

ranges:1)
2016-04-17 22:38:06,597 [tserver.TabletServer] DEBUG: MultiScanSess
127.0.0.1:51092 0 entries in 0.00 secs (lookup_time:0.00 secs

tablets:1

ranges:1)
2016-04-17 22:38:07,568 [tserver.TabletServer] DEBUG: ScanSess tid
127.0.0.1:42307 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1]
2016-04-17 22:38:07,571 [zookeeper.DistributedWorkQueue] DEBUG: Looking
for work in
/accumulo/c7998e9d-129d-4c61-9732-45bb6bcab3f0/bulk_failed_copyq
2016-04-17 22:38:07,940 [tserver.TabletServer] DEBUG: UpSess
127.0.0.1:41090 0 in 0.067s, at=[1 1 1.00 1] ft=0.000s(pt=0.000s
lt=0.000s ct=0.000s)
2016-04-17 22:38:07,941 [tserver.TabletServer] DEBUG: Failures: 1,

first

extent vb<<   successful commits: 0

Re: Checking what a BatchWriter is stuck on; failure during split

Reply via email to