Hi AsterixDB community,

I am Tejesh, and I am currently working on supporting Top-k nearest
neighbor queries for GSoC 2026. During my research on how R-tree memtables
are converted to SSTables, I encountered conflicting information and would
like to clarify which approach aligns with the AsterixDB architecture.

The two sources I found state the following:

 Source 1: When R-trees are converted from RAM to SSTables on disk, only
the leaf nodes are extracted, and the internal index entries are discarded
to save disk space. These leaf nodes are then sorted by Hilbert values and
stored sequentially to minimize random I/O, effectively destroying the
original R-tree structure and flattening it out. Whenever the R trees are
accessed, disk reads them sequentially
 Source 2: When R-trees are converted, the index entries are preserved to
reduce the search space, despite a slight increase in storage and random
I/O. The R-tree structure remains intact, with index entries pointing to
child node page IDs. These nodes are then grouped and sorted by Hilbert
values to reduce MBR overlaps. Here, disk reading may not be sequential; it
could jump between pages of Disk memory.

Could you please clarify which of these descriptions accurately reflects
how AsterixDB handles this conversion?

I have studied and understood secondary indexing, B-trees, R-trees,
LSM-ified R-trees, page handling, and developed a few high-level ideas to
solve the problem mainly by using a global priority queue. Now I am
navigating through the codebase to see how the logic and data flow of these
data structures are actually implemented and getting myself familiar with
the operators to actually implement my idea to work with Hyracks. I would
greatly appreciate it if I could receive a reply.

Best regards,

Tejesh

Reply via email to