Re: [GSoC 2026 Proposal] libgomp Optimizations for Scheduler Guided OpenMP Execution

Himadri Chhaya-Shailesh via Gcc Tue, 31 Mar 2026 03:29:37 -0700

Hi Souradeep,

Thank you for your interest in the project. I appreciate the time and 
effort involved in preparing a proposal. Unfortunately, at this stage, 
we’re not able to consider last-minute submissions, as there would not 
be enough time for you to complete the prerequisite task or for us to 
review and discuss your proposal properly.


Himadri

On 3/31/26 11:10, Souradeep Banerjee wrote:
> Dear GCC community and mentors (Himadri, Andrea, Tobias, Thomas),
>
> I'm Souradeep Banerjee, an international student (junior) majoring in
> Computer Science at Arizona State University. I have a deep passion
> for low-level systems and compiler programming, and I am reaching out
> regarding the "libgomp Optimizations for Scheduler Guided OpenMP
> Execution in Cloud VMs" project for GSoC 2026.
>
> I'm Souradeep Banerjee, an international student between sophomore and
> junior level (I graduate next spring), majoring in Computer Science at
> Arizona State University. I have a deep passion for systems
> programming, and I am looking for different ways to build and prove my
> skillset. I'm reaching regarding the libgomp Optimizations for
> Scheduler Guided OpenMP Execution in Cloud VMs project for GSoC 2026.
> I know that I am entering the process close to the deadline, but I
> have spent the past few days doing a deep dive into the FOSDEM '26
> presentation, the semantic gap causing Phantom vCPUs, and the
> "Juunansei" paravirtualized policies.
>
> I'm interested in this problem because it relates to some projects
> that I've been working on for quite a while. I'm currently building a
> MIPS emulator that follows the pipeline architecture listed in the
> MIPS computer organisation book by Patterson and Hennessy. The current
> version of this project has a simple two-pass assembler and a
> cycle-accurate simulation of MIPS instructions, along with a hazard
> detection unit and other simple features like a branch target buffer.
> I'm currently reading books and tutorials to integrate a virtual
> management system on my simulator to support paging. I'm also working
> on building a C Compiler from scratch that strictly follows the C11
> ISO standard (I've only completed building the lexer).
>
> As I finalize my proposal tonight, I wanted to share my technical
> approach and ask two quick architectural questions to ensure my
> implementation plan is aligned with your expectations:
>
> - The Shared Memory Bridge: To expose the eBPF map data (phantom and
> idle averages) to the guest user-space without syscall overhead, my
> proposal relies on utilizing ivshmem (Inter-VM Shared Memory). Is this
> the preferred low-latency mechanism the team envisions, or is there a
> different memory-mapped file approach I should prioritize?
>
> - libgomp Integration Entry Point: I saw Yuao Ma's recent draft patch
> for parsing OMP_WAIT_POLICY=pvsched. My plan is to start by applying
> and rebasing that patch, replicating the logic to register
> GOMP_DYNAMIC_POLICY, and then implementing the Juunansei algorithm's
> 2-tick stability check and juunansei_cpuset affinity logic. Does this
> sound like the correct chronological approach for the runtime
> modifications?
>
> I have pasted the plain text of my full proposal draft below for any
> last-minute feedback. I will be submitting the official PDF to the
> GSoC portal this morning.
>
> Thank you for your time, your incredible work on this project, and the
> fantastic FOSDEM presentation.
>
> Yours sincerely,
> Souradeep Banerjee
> [email protected] | [email protected]
> https://souradeep.dev
> https://github.com/Souradeep1101
> ________________
>
> GCC GSoC 2026: libgomp Optimizations for Scheduler Guided OpenMP
> Execution in Cloud VMs
>
>
>
>
>
>
> Name: Souradeep Banerjee                        Alternate Email:
> [email protected]
> Email: [email protected]                        University: Arizona
> State University
> Mentor(s): Himadri CS, Andrea Righi, Tobias Burnus, Thomas Schwinge
> Mentor Email(s): [email protected], [email protected],
> [email protected], [email protected]
> ________________
>
>
> Proposal Summary
> The problem with OpenMP execution within oversubscribed cloud VMs is
> that there is a critical semantic gap between the hypervisor on the
> host and the OS within the guest. This means that when a vCPU is
> preempted on the host, the OpenMP runtime within the guest is unaware
> of this. This leads to millions of CPU cycles being wasted as threads
> actively spin at barrier synchronizations, waiting for these Phantom
> vCPUs to turn up. This semantic gap is similar in functionality to the
> many-to-few problem in scheduling user-space and kernel-space threads
> as taught in my course on operating systems, which requires the use of
> upcalls to prevent cascading blocks.
> The goal of this project is to remove this bottleneck within
> oversubscribed cloud VMs, and I will be working on a paravirtualized,
> scheduler-informed OpenMP architecture. I will be working on an eBPF
> Phantom Tracker using a GCC backend, as well as implementing the
> "Juunansei" dynamic policies within libgomp to dynamically adjust the
> Degree of Parallelism (DoP) and block threads waiting on preempted
> vCPUs.
> ________________
>
>
> Solution Outline
> My initial hypothesis goes like this:
> 1. eBPF Phantom Tracker: My initial objective is to port the existing
> 500+-line in-kernel implementation of Phantom Tracker into an eBPF
> program. To achieve this, I propose writing a C program and compiling
> it using the GCC compiler’s eBPF backend, setting it up to hook into
> two Linux kernel tracepoints: sched_switch and sched_wakeup. These two
> events will enable me to compute phantom_average and idle_average
> continuously at a granularity determined by the kernel’s scheduling
> tick.
> 2. Low-Latency Shared Memory Bridge: To enable the guest user space to
> dynamically react to events, it is crucial that it can access data
> from the host’s eBPF map without incurring performance degradation due
> to expensive context switches and system calls. To achieve this
> objective, I propose architecting a low latency shared memory bridge
> to enable the VM access to data in the host’s eBPF map; my primary
> choice is ivshmem – Inter-VM Shared Memory.
> 3. libgomp Paravirtualized Policies (Juunansei): I will start my work
> inside the libgomp source tree by applying and rebasing the existing
> draft patch by Yuao Ma on parsing OMP_WAIT_POLICY inside env.c. Next,
> I will replicate the parsing logic to register the new
> GOMP_DYNAMIC_POLICY ICV. Then, I will proceed with the implementation
> of the paravirtualized runtime logic:
>     1. For GOMP_DYNAMIC_POLICY=pvsched: Following the implementation of
> the Juunansei algorithm, I will modify the gomp_dynamic_max_threads()
> routine. The logic will be as follows: If phantom_average > 0, the DoP
> is immediately reduced. If idle cores are available, the DoP is
> conservatively increased; i.e., we demand 2 ticks before thrashing is
> possible. If we detect phantoms after scaling, we apply a penalty:
> stability_requirement *= 2. Additionally, I plan to use the routine
> juunansei_cpuset to shrink the thread affinity mask to the range [0,
> N-P-1].
>     2. For GOMP_WAIT_POLICY=pvsched: I will modify the routine do_spin.
> When the thread reaches the team/dock barrier, the thread will read
> the state from shared memory. If the peer is detected as Phantom, the
> thread will skip the normal spin-loop routine and call block()
> immediately to yield the physical CPU back to the host.
> ________________
>
>
> Project Schedule
> I am going to commit to a 30 to 40-hour work week coding period (May
> 25 to August 24) to fulfill the requirements for this advanced
> project.
> Note: I will be participating in a study abroad program in Tokyo with
> Arizona State University from May 15 to May 31. I will frontload and
> work on my codebase research during the first half of the Community
> Bonding period (May 1 to May 14). During Week 1 of the coding period
> (May 25 to May 31), I will manage my hours around my program schedule
> and synchronize my availability with my mentor's time zone.
> Please note that this schedule is tentative and subject to change.
> Community Bonding Period (May 1 to May 24)
> * Objective: Codebase navigation, environment setup, and architectural
> finalization.
> * Tasks:
>     * Clone, build, and debug the GCC toolchain and libgomp locally from 
> source.
>     * Set up a hardware-assisted virtualized test environment
> (QEMU/KVM) to simulate host-level oversubscription.
>     * Study the existing ~500-line in-kernel Phantom Tracker to trace
> the exact vCPU state calculation logic.
>     * Finalize the exact shared memory (ivshmem) architecture and eBPF
> hook points with my mentors.
> Phase 1: eBPF Phantom Tracker (Weeks 1 to 3 | May 25 to June 14)
> * Objective: Implement host-level vCPU tracking using GCC's eBPF backend.
> * Tasks:
>     * Write the C program targeting the GCC eBPF backend to hook into
> the Linux kernel's sched_switch and sched_wakeup tracepoints.
>     * Implement the state machine logic to record timestamps of vCPU
> preemptions and wakeups.
>     * Calculate the phantom_average and idle_average metrics at the
> granularity of the scheduler tick.
>     * Validate the tracepoint logic and tick calculations locally using 
> bpftool.
> Phase 2: Low-Latency Shared Memory Bridge (Weeks 4 to 6 | June 15 to July 5)
> * Objective: Expose the eBPF host scheduler state to the guest user-space.
> * Tasks:
>     * Architect and implement the ivshmem (Inter-VM Shared Memory) mechanism.
>     * Map the eBPF map data (containing the phantom and idle averages)
> directly into the guest VM's memory space.
>     * Write a minimal C test program inside the guest to verify
> real-time, low-latency reads of the shared memory without invoking
> heavy system calls.
> Phase 3: libgomp Paravirtualized Policies (Weeks 7 to 9 | July 6 to July 26)
> * Objective: Implement the Juunansei dynamic policies inside the OpenMP 
> runtime.
> * Tasks:
>     * Apply and rebase Yuao Ma's draft patch for OMP_WAIT_POLICY
> parsing in env.c, and replicate the logic to register
> GOMP_DYNAMIC_POLICY.
>     * Modify gomp_dynamic_max_threads() for
> GOMP_DYNAMIC_POLICY=pvsched: instantly scale down DoP if phantoms
> exist, or conservatively scale up, requiring stability over 2
> scheduler ticks. Implement juunansei_cpuset to pack threads on active
> vCPUs.
>     * Rewrite the do_spin primitive for GOMP_WAIT_POLICY=pvsched to
> read the shared memory and immediately block() if the target peer is a
> flagged Phantom.
> Phase 4: Benchmarking & Upstreaming (Weeks 10 to 12 | July 27 to August 17)
> * Objective: Prove the performance gain and merge the code.
> * Tasks:
>     * Benchmark the new pvsched policies against the static baseline
> (OMP_WAIT_POLICY=passive) using NAS Parallel Benchmarks (BT, CG, FT,
> LU, MG, SP, UA) with Class B inputs.
>     * Simulate host-level oversubscription using a competing "Random
> Spinners" workload managed via cgroups.
>     * Write comprehensive DejaGnu testsuite coverage for the new
> environment variables and update the libgomp.texi documentation.
>     * Format the commits to strictly adhere to GCC's coding style
> guidelines and submit the patch series to the gcc-patches mailing
> list.
> Final Week (August 18 to August 24)
>     * Buffer week for addressing final patch review comments from the
> mailing list, cleaning up documentation, and submitting the final GSoC
> evaluation.
> ________________
>
>
> Relevant Experience
> Core Languages and Tools: C, C++, Assembly (MIPS), Python, GDB, CMake,
> Git, Linux
> Projects and Involvement
> Systems Programming
> I implemented a cycle-accurate MIPS processor with a 5-stage pipeline
> in C++23. The implementation includes the hardwired pipeline and
> hazard detection logic for load-use stalls and branch delays. I am
> currently designing the Coprocessor 0 (CP0) unit, including the Memory
> Management Unit (MMU) and Translation Lookaside Buffer (TLB), which
> manage the transition of the processor state. Managing precise
> processor state transitions and execution hazard logic directly
> translates to understanding how vCPUs are scheduled, preempted, and
> tracked via Linux kernel tracepoints. This architectural intuition is
> exactly what is required to implement the eBPF Phantom Tracker.
> Source Code: https://github.com/Souradeep1101/CyclopsMIPS
> Compiler Programming
> I designed a highly optimized C11 ISO-compliant lexical analyzer from
> scratch. I have implemented strict tokenization rules and memory
> management at the character level to optimize source code parsing. I
> am currently designing the parser and semantic analysis stages,
> researching Abstract Syntax Tree optimization techniques to implement
> complex C type constraints. This rigorous, low-level approach to
> source code data parsing provides the exact foundational architecture
> knowledge necessary to navigate the GCC codebase, target the eBPF
> backend for the tracker, and safely parse the new GOMP_DYNAMIC_POLICY
> ICVs inside libgomp.
> Source Code: https://github.com/Souradeep1101/CCompiler
> Graphics Programming
> For Cyclops Studio, which is a high-frequency 2D animation engine, I
> have implemented a rendering pipeline using a CPU/GPU memory arena
> that removes allocation costs during batch submission. I have
> implemented strict buffer management using direct pointer arithmetic
> and memset for Vertex Buffer Object scratchpad management. This
> demonstrated ability to implement high-frequency data structures and
> raw hardware management with low abstraction perfectly prepares me to
> architect the low-latency ivshmem bridge between the host kernel and
> guest user-space.
> Source Code: https://github.com/Souradeep1101/CyclopsStudio
> Competitive Programming
> I have attained 2nd Place at the 2025 ICPC North American Qualifier
> (ASU Site) and have been a finalist at the ICPC Rocky Mountain
> Regional Contest, representing Arizona State University. Apart from
> that, I am currently enrolled in an advanced algorithmic seminar with
> the ASU ICPC coach. I am also an officer at the ACM student chapter,
> where I organize algorithmic workshops known as "ICPC Primer." To do
> so, I have had to write extensive logic documentation on complex
> problem-solving. This training rigorously prepares me in isolating
> edge cases, execution logic, and creating highly optimized,
> memory-safe C/C++ code in a time-constrained manner. This algorithmic
> rigor is exactly what is needed to implement the low-latency Juunansei
> dynamic heuristics (DoP scaling and blocking) directly inside the
> libgomp runtime without introducing overhead.
> Workshop Documentation & Portfolio:
> https://www.souradeep.dev/blog/icpc-spring-2026-overview
>
> ________________
>
>
> Commitments and Availability
> Academic Status
> I am an international student from India, majoring in Computer Science
> (Software Engineering) at Arizona State University. I am on an
> accelerated track to complete my Bachelor's degree in three years,
> with an expected graduation date of May 2027. I am also on an
> accelerated track to complete my Master’s degree in one year, with an
> expected graduation date of May 2028.
> Summer Capacity
> I am fully prepared to treat GSoC as my primary summer commitment. I
> will dedicate 30 to 40 hours per week to the GCC project for the
> duration of the standard coding period to make sure the 350-hour
> requirement is comfortably met and exceeded.
> Schedule and Tokyo Study Abroad
> I will be participating in a study abroad program in Tokyo with
> Arizona State University from May 15 to May 31. I will frontload and
> work on my codebase research during the first half of the Community
> Bonding period (May 1 to May 14). During Week 1 of the coding period
> (May 25 to May 31), I will manage my hours around my program schedule
> and synchronize my availability with my mentor's time zone.

Re: [GSoC 2026 Proposal] libgomp Optimizations for Scheduler Guided OpenMP Execution

Reply via email to