On Tue, Mar 24, 2026 at 5:02 AM Divij Agarwal <[email protected]> wrote:

> Hi Shiva,
>
Hi Divij,
Good progress and good questions!

> Thanks for your guidance.
>
> I went through the SQL++ primer and looked at the logical query plans
> and Hyracks job specifications for different queries using the "Explain"
> functionality in the UI. I also started reading the Hyracks paper and the
> memory adaptive hash join thesis mentioned in another thread, and I am
> working through them to better understand the execution and memory model.
>
> In the meantime, I had a few high level questions:
>
> - In practice, which tends to be the bigger issue with static memory
> allocation: under-allocation or over-allocation ?
>
> I guess both are big issues, but which one happens more often depends on
the system: If it expects too many queries to arrive then it will suffer
from under-allocation but if it is optimistic then it suffers from
over-allocation. In the current design of AsterixDB, memory budgets are
allocated to each memory-intensive operator which are either the default
values or specified by the user, which in either case can be quite
"inaccurate". Over-allocation or under-allocation depends on how inaccurate
the memory budgets are and in which direction (did the user underestimate
or overestimate?). I would recommend you to check these two resources if
you get the time:
https://scholarcommons.scu.edu/cseng_mstr/35/

Diane L. Davison and Goetz Graefe. “Dynamic Resource Brokering for
Multi-User Query Execution”. In:

Proceedings of the 1995 ACM SIGMOD International Conference on Management
of Data. SIGMOD ’95.

San Jose, California, USA: Association for Computing Machinery, 1995, pp.
281–292. isbn: 0897917316. doi:

10.1145/223784.223845. url: https://doi.org/10.1145/223784.223845.

> - For dynamic memory management, are there specific operators that would be
> most impactful to prioritize first?
>
There are 5-6 memory-intensive operators but you can start with Hybrid Hash
Join or Sort (User in group by and a few other operators).

>
> - Is the primary objective to optimize for fairness across concurrent
> queries, overall throughput, or predictability/tail latency?
>
We can have some already provided but the design should be generic enough
so that the user can define their own performance goal. The performance
goal is the user choice.

>
> - My understanding is that, with Hyracks (still going over the paper…),
> execution is scheduled across compute nodes first, and dynamic memory
> management decisions will be made locally at each node. Is this accurate,
> or are there cases where dynamic memory behavior needs coordination across
> nodes at the query level?
>
Locally sounds good, but the logic of your approach should ensure there can
never be a deadlock. For example an instance of query on a data partition
should not run out of memory cause if it happens then your query partially
fails.

>
> I’ll continue going through the papers and will send a draft proposal soon.
>
> Best,
> Divij Agarwal
>
> Best,
Shiva

>
> On Tue, Feb 24, 2026 at 7:02 PM Shiva Jahangiri <[email protected]>
> wrote:
>
> > Hi Divij,
> >
> > Thanks for your interest in AsterixDB and this project!  Currently there
> > isn’t any specific Jira issue with regard to this project, but there are
> > few ways you can familiarize yourself with sections of the code that are
> > relevant to this project.
> >
> > It’s great that you have got AsterixDB running! Next steps would be to
> > try SQL++
> > Primer <https://asterixdb.apache.org/docs/0.9.9/sqlpp/primer-sqlpp.html>
> > with
> > its toy datasets and queries to familiarize yourself with some of the
> > features of AsterixDB. If interested, select the Optimized Logical Plan
> and
> > Hyracks Jobs from AsterixDB's UI and explore them further!
> >
> > If you want to be able to debug AsterixDB locally through IntelliJ or
> your
> > preferred IDE, you can follow these steps
> > <
> >
> https://scudbis.notion.site/Guide-to-Setup-Asterix-Locally-6cc6da8e8130483f9bc9e2a51ccbcc71
> > >.
> > Then you can try to find and follow how memory intensive operators (e.g.
> > OptimizedHybridHashJoinOperatorDescriptor.java) work, if you are
> > interested. This should already be plenty! :-)
> >
> > Let us know if you have any questions!
> >
> > Best,
> >
> > Shiva Jahangiri
> > Assistant Professor in Computer Science and Engineering Department
> > Santa Clara University
> >
> >
> >
> > On Tue, Feb 24, 2026 at 5:02 AM Divij Agarwal <[email protected]>
> wrote:
> >
> > > Hi all,
> > >
> > > I am Divij Agarwal, a Computer Science junior at Purdue University. I
> > have
> > > been a Java teaching assistant at my university for the past two years
> > and
> > > am interested in the Dynamic Memory Management project for GSoC 2026. I
> > > implemented a custom malloc in C as part of coursework, which sparked
> my
> > > interest in systems programming and memory allocation strategies.
> > >
> > > I was also able to build and run AsterixDB locally. Are there any
> issues
> > or
> > > starter tasks related to memory management that I can begin working
> on? I
> > > would appreciate guidance on where to start contributing.
> > >
> > > Best regards,
> > >
> > > Divij Agarwal
> > >
> >
>


-- 
Shiva Jahangiri
Assistant Professor in Computer Science and Engineering Department
Santa Clara University

Reply via email to