LSM storage refactoring

Ildar Absalyamov Mon, 18 Sep 2017 22:16:44 -0700

Hi Devs,

In line with earlier major structural refactorings of storage/index-related 
code [1] I would like to propose a next step in this cleanup [2].
The main problem that I tried to solve with this patch is that code responsible 
for LSM disk/memory component lifecycle (creation, destruction, bulkloading, 
etc) is smeared across fabric methods in appropriate index implementations, 
while much of it is duplicated between various types of index components 
(bTrees, externalBTrees, externalBTreesWithBuddyBTree, rTrees, 
antimatterRTrees, invertedIndexes, etc). Moreover all these different 
disk\memory component implementations have a lot of commonality in how they 
manage lifecycle of their parts (main indexes, bloom filters, 
buddyBTrees\deletedKeysBTrees).


This change removes much of boilerplate from LSM component-handling code and 
relies on more object-oriented design to bring in the logic of a particular 
element of the component into one place.
It also introduces a composable method of assembling bulkload pipelines, 
allowing to create a chain of operators,  responsible for bulkloading a piece 
of component, and easily extend this pipeline with additional operations 
(calculating stats\inferring schema\etc).

If your are interested or have an opinion on how this part of the codebase 
should be structured (or it will break all your code in a private branch ;)), 
please have a look [2].

[1] https://asterix-gerrit.ics.uci.edu/#/c/1728/ 
<https://asterix-gerrit.ics.uci.edu/#/c/1728/>
[2] https://asterix-gerrit.ics.uci.edu/#/c/2014/ 
<https://asterix-gerrit.ics.uci.edu/#/c/2014/>
Best regards,
Ildar

LSM storage refactoring

Reply via email to