[PR] Avoids reading all fate ids into memory. [accumulo]

via GitHub Thu, 04 Jan 2024 15:16:14 -0800


keith-turner opened a new pull request, #4129:
URL: https://github.com/apache/accumulo/pull/4129


   As we change Accumulo to use FATE for per tablet operations its important to 
avoid reading all FATEs persisted data into memory. This commit modifies FATE 
to use Streams internally instead of Collections. For the Accumulo implemention 
of FATE storage this makes it possible to have java stream backed by a scanner 
which avoids reading all of the FATE ids into memory.  The Zookeeper storage 
implementation will still read everything into memory.
   
   Another change that was made in the PR was optimizing the Accumulo storage 
layer to read the status while reading the id.  Before this change ids were 
read from scanner, then for each id a scanner was created to read the status. 
Now the status and id are read in stream from the same scanner which should be 
much faster.  This change was not possible for Zookeeper, it will still make an 
RPC to get each status. Its ok that Zookeeper store is less efficient as the 
Accumulo store will likely store orders of magnitude more data.  Its probably 
not possible to make the same optimizations for speed and memory in the 
zookeeper store.
   
   A bug in the Fate integration test was fixed by using the Unknown status 
which represents the status for transaction that does not exists in the 
persisted store.  Ran into this bug while testing these changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Avoids reading all fate ids into memory. [accumulo]

Reply via email to