milleruntime commented on pull request #1887:
URL: https://github.com/apache/accumulo/pull/1887#issuecomment-771677164
> @milleruntime do you have any tserver stack traces from when it got stuck?
I have been poking around in the code looking for a possible cause, have not
found anything yet.
There weren't any errors, it was just the state of the tserver that I
observed. It seemed no flushes from the memory manager were happening because
the 4 tablets that were chosen by
`LargestFirstMemoryManager.tabletsToMinorCompact()` were being rejected. Here
is what was showing up in the logs repeatedly:
<pre>
2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: COMPACTING
2k;6;5 total = 178,210,928 ingestMemory = 178,210,928
2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: chosenMem
= 2,614,281 chosenIT = 313.02 load 3,326,974
2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: COMPACTING
2k;9;8 total = 178,210,928 ingestMemory = 178,210,928
2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: chosenMem
= 2,544,515 chosenIT = 313.02 load 3,238,194
2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: COMPACTING
2k<;9 total = 178,210,928 ingestMemory = 178,210,928
2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: chosenMem
= 2,534,061 chosenIT = 313.02 load 3,224,885
2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: COMPACTING
2k;4;3 total = 178,210,928 ingestMemory = 178,210,928
2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: chosenMem
= 2,431,564 chosenIT = 313.02 load 3,094,446
2021-02-01T15:26:09,855 [tablet.Tablet] DEBUG: Table 2k is being deleted so
don't flush 2k;6;5
2021-02-01T15:26:09,855 [tserver.TabletServerResourceManager] INFO :
Ignoring memory manager recommendation: not minor compacting 2k;6;5
2021-02-01T15:26:09,855 [tablet.Tablet] DEBUG: Table 2k is being deleted so
don't flush 2k;9;8
2021-02-01T15:26:09,855 [tserver.TabletServerResourceManager] INFO :
Ignoring memory manager recommendation: not minor compacting 2k;9;8
2021-02-01T15:26:09,855 [tablet.Tablet] DEBUG: Table 2k is being deleted so
don't flush 2k<;9
2021-02-01T15:26:09,855 [tserver.TabletServerResourceManager] INFO :
Ignoring memory manager recommendation: not minor compacting 2k<;9
2021-02-01T15:26:09,855 [tablet.Tablet] DEBUG: Table 2k is being deleted so
don't flush 2k;4;3
2021-02-01T15:26:09,855 [tserver.TabletServerResourceManager] INFO :
Ignoring memory manager recommendation: not minor compacting 2k;4;3
</pre>
> > I think this wait in Tablet.completeClose() is what might have been
holding up the tablets from being closed:
>
> What made you think this?
Since the tablets should have been unloaded, I was just looking through the
code to try and figure out what was preventing them from unloading. I was
guessing that something else triggered a flush for the tablet calling
`Tablet.prepareForMinC()` and would prevent the
`getTabletMemory().waitForMinC()` call in the close from finishing. But now
that I think about it more, if this happened, there shouldn't have been
anything preventing the other flush thread from completing since the check I
added is in `Tablet.initiateMinorCompaction()`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]