milleruntime commented on pull request #1887:
URL: https://github.com/apache/accumulo/pull/1887#issuecomment-771677164


   > @milleruntime do you have any tserver stack traces from when it got stuck? 
I have been poking around in the code looking for a possible cause, have not 
found anything yet.
   
   There weren't any errors, it was just the state of the tserver that I 
observed. It seemed no flushes from the memory manager were happening because 
the 4 tablets that were chosen by 
`LargestFirstMemoryManager.tabletsToMinorCompact()` were being rejected.  Here 
is what was showing up in the logs repeatedly:
   <pre>
   2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: COMPACTING 
2k;6;5  total = 178,210,928 ingestMemory = 178,210,928
   2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: chosenMem 
= 2,614,281 chosenIT = 313.02 load 3,326,974
   2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: COMPACTING 
2k;9;8  total = 178,210,928 ingestMemory = 178,210,928
   2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: chosenMem 
= 2,544,515 chosenIT = 313.02 load 3,238,194
   2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: COMPACTING 
2k<;9  total = 178,210,928 ingestMemory = 178,210,928
   2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: chosenMem 
= 2,534,061 chosenIT = 313.02 load 3,224,885
   2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: COMPACTING 
2k;4;3  total = 178,210,928 ingestMemory = 178,210,928
   2021-02-01T15:26:09,855 [memory.LargestFirstMemoryManager] DEBUG: chosenMem 
= 2,431,564 chosenIT = 313.02 load 3,094,446
   2021-02-01T15:26:09,855 [tablet.Tablet] DEBUG: Table 2k is being deleted so 
don't flush 2k;6;5
   2021-02-01T15:26:09,855 [tserver.TabletServerResourceManager] INFO : 
Ignoring memory manager recommendation: not minor compacting 2k;6;5
   2021-02-01T15:26:09,855 [tablet.Tablet] DEBUG: Table 2k is being deleted so 
don't flush 2k;9;8
   2021-02-01T15:26:09,855 [tserver.TabletServerResourceManager] INFO : 
Ignoring memory manager recommendation: not minor compacting 2k;9;8
   2021-02-01T15:26:09,855 [tablet.Tablet] DEBUG: Table 2k is being deleted so 
don't flush 2k<;9
   2021-02-01T15:26:09,855 [tserver.TabletServerResourceManager] INFO : 
Ignoring memory manager recommendation: not minor compacting 2k<;9
   2021-02-01T15:26:09,855 [tablet.Tablet] DEBUG: Table 2k is being deleted so 
don't flush 2k;4;3
   2021-02-01T15:26:09,855 [tserver.TabletServerResourceManager] INFO : 
Ignoring memory manager recommendation: not minor compacting 2k;4;3
   </pre>
    
   > > I think this wait in Tablet.completeClose() is what might have been 
holding up the tablets from being closed:
   > 
   > What made you think this?
   
   Since the tablets should have been unloaded, I was just looking through the 
code to try and figure out what was preventing them from unloading. I was 
guessing that something else triggered a flush for the tablet calling 
`Tablet.prepareForMinC()` and would prevent the 
`getTabletMemory().waitForMinC()` call in the close from finishing. But now 
that I think about it more, if this happened, there shouldn't have been 
anything preventing the other flush thread from completing since the check I 
added is in `Tablet.initiateMinorCompaction()`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to