On Tue, Mar 24, 2026 at 5:02 AM Divij Agarwal <[email protected]> wrote:
> Hi Shiva, > Hi Divij, Good progress and good questions! > Thanks for your guidance. > > I went through the SQL++ primer and looked at the logical query plans > and Hyracks job specifications for different queries using the "Explain" > functionality in the UI. I also started reading the Hyracks paper and the > memory adaptive hash join thesis mentioned in another thread, and I am > working through them to better understand the execution and memory model. > > In the meantime, I had a few high level questions: > > - In practice, which tends to be the bigger issue with static memory > allocation: under-allocation or over-allocation ? > > I guess both are big issues, but which one happens more often depends on the system: If it expects too many queries to arrive then it will suffer from under-allocation but if it is optimistic then it suffers from over-allocation. In the current design of AsterixDB, memory budgets are allocated to each memory-intensive operator which are either the default values or specified by the user, which in either case can be quite "inaccurate". Over-allocation or under-allocation depends on how inaccurate the memory budgets are and in which direction (did the user underestimate or overestimate?). I would recommend you to check these two resources if you get the time: https://scholarcommons.scu.edu/cseng_mstr/35/ Diane L. Davison and Goetz Graefe. “Dynamic Resource Brokering for Multi-User Query Execution”. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. SIGMOD ’95. San Jose, California, USA: Association for Computing Machinery, 1995, pp. 281–292. isbn: 0897917316. doi: 10.1145/223784.223845. url: https://doi.org/10.1145/223784.223845. > - For dynamic memory management, are there specific operators that would be > most impactful to prioritize first? > There are 5-6 memory-intensive operators but you can start with Hybrid Hash Join or Sort (User in group by and a few other operators). > > - Is the primary objective to optimize for fairness across concurrent > queries, overall throughput, or predictability/tail latency? > We can have some already provided but the design should be generic enough so that the user can define their own performance goal. The performance goal is the user choice. > > - My understanding is that, with Hyracks (still going over the paper…), > execution is scheduled across compute nodes first, and dynamic memory > management decisions will be made locally at each node. Is this accurate, > or are there cases where dynamic memory behavior needs coordination across > nodes at the query level? > Locally sounds good, but the logic of your approach should ensure there can never be a deadlock. For example an instance of query on a data partition should not run out of memory cause if it happens then your query partially fails. > > I’ll continue going through the papers and will send a draft proposal soon. > > Best, > Divij Agarwal > > Best, Shiva > > On Tue, Feb 24, 2026 at 7:02 PM Shiva Jahangiri <[email protected]> > wrote: > > > Hi Divij, > > > > Thanks for your interest in AsterixDB and this project! Currently there > > isn’t any specific Jira issue with regard to this project, but there are > > few ways you can familiarize yourself with sections of the code that are > > relevant to this project. > > > > It’s great that you have got AsterixDB running! Next steps would be to > > try SQL++ > > Primer <https://asterixdb.apache.org/docs/0.9.9/sqlpp/primer-sqlpp.html> > > with > > its toy datasets and queries to familiarize yourself with some of the > > features of AsterixDB. If interested, select the Optimized Logical Plan > and > > Hyracks Jobs from AsterixDB's UI and explore them further! > > > > If you want to be able to debug AsterixDB locally through IntelliJ or > your > > preferred IDE, you can follow these steps > > < > > > https://scudbis.notion.site/Guide-to-Setup-Asterix-Locally-6cc6da8e8130483f9bc9e2a51ccbcc71 > > >. > > Then you can try to find and follow how memory intensive operators (e.g. > > OptimizedHybridHashJoinOperatorDescriptor.java) work, if you are > > interested. This should already be plenty! :-) > > > > Let us know if you have any questions! > > > > Best, > > > > Shiva Jahangiri > > Assistant Professor in Computer Science and Engineering Department > > Santa Clara University > > > > > > > > On Tue, Feb 24, 2026 at 5:02 AM Divij Agarwal <[email protected]> > wrote: > > > > > Hi all, > > > > > > I am Divij Agarwal, a Computer Science junior at Purdue University. I > > have > > > been a Java teaching assistant at my university for the past two years > > and > > > am interested in the Dynamic Memory Management project for GSoC 2026. I > > > implemented a custom malloc in C as part of coursework, which sparked > my > > > interest in systems programming and memory allocation strategies. > > > > > > I was also able to build and run AsterixDB locally. Are there any > issues > > or > > > starter tasks related to memory management that I can begin working > on? I > > > would appreciate guidance on where to start contributing. > > > > > > Best regards, > > > > > > Divij Agarwal > > > > > > -- Shiva Jahangiri Assistant Professor in Computer Science and Engineering Department Santa Clara University
