Hi Souradeep, Thank you for your interest in the project. I appreciate the time and effort involved in preparing a proposal. Unfortunately, at this stage, we’re not able to consider last-minute submissions, as there would not be enough time for you to complete the prerequisite task or for us to review and discuss your proposal properly.
Himadri On 3/31/26 11:10, Souradeep Banerjee wrote: > Dear GCC community and mentors (Himadri, Andrea, Tobias, Thomas), > > I'm Souradeep Banerjee, an international student (junior) majoring in > Computer Science at Arizona State University. I have a deep passion > for low-level systems and compiler programming, and I am reaching out > regarding the "libgomp Optimizations for Scheduler Guided OpenMP > Execution in Cloud VMs" project for GSoC 2026. > > I'm Souradeep Banerjee, an international student between sophomore and > junior level (I graduate next spring), majoring in Computer Science at > Arizona State University. I have a deep passion for systems > programming, and I am looking for different ways to build and prove my > skillset. I'm reaching regarding the libgomp Optimizations for > Scheduler Guided OpenMP Execution in Cloud VMs project for GSoC 2026. > I know that I am entering the process close to the deadline, but I > have spent the past few days doing a deep dive into the FOSDEM '26 > presentation, the semantic gap causing Phantom vCPUs, and the > "Juunansei" paravirtualized policies. > > I'm interested in this problem because it relates to some projects > that I've been working on for quite a while. I'm currently building a > MIPS emulator that follows the pipeline architecture listed in the > MIPS computer organisation book by Patterson and Hennessy. The current > version of this project has a simple two-pass assembler and a > cycle-accurate simulation of MIPS instructions, along with a hazard > detection unit and other simple features like a branch target buffer. > I'm currently reading books and tutorials to integrate a virtual > management system on my simulator to support paging. I'm also working > on building a C Compiler from scratch that strictly follows the C11 > ISO standard (I've only completed building the lexer). > > As I finalize my proposal tonight, I wanted to share my technical > approach and ask two quick architectural questions to ensure my > implementation plan is aligned with your expectations: > > - The Shared Memory Bridge: To expose the eBPF map data (phantom and > idle averages) to the guest user-space without syscall overhead, my > proposal relies on utilizing ivshmem (Inter-VM Shared Memory). Is this > the preferred low-latency mechanism the team envisions, or is there a > different memory-mapped file approach I should prioritize? > > - libgomp Integration Entry Point: I saw Yuao Ma's recent draft patch > for parsing OMP_WAIT_POLICY=pvsched. My plan is to start by applying > and rebasing that patch, replicating the logic to register > GOMP_DYNAMIC_POLICY, and then implementing the Juunansei algorithm's > 2-tick stability check and juunansei_cpuset affinity logic. Does this > sound like the correct chronological approach for the runtime > modifications? > > I have pasted the plain text of my full proposal draft below for any > last-minute feedback. I will be submitting the official PDF to the > GSoC portal this morning. > > Thank you for your time, your incredible work on this project, and the > fantastic FOSDEM presentation. > > Yours sincerely, > Souradeep Banerjee > [email protected] | [email protected] > https://souradeep.dev > https://github.com/Souradeep1101 > ________________ > > GCC GSoC 2026: libgomp Optimizations for Scheduler Guided OpenMP > Execution in Cloud VMs > > > > > > > Name: Souradeep Banerjee Alternate Email: > [email protected] > Email: [email protected] University: Arizona > State University > Mentor(s): Himadri CS, Andrea Righi, Tobias Burnus, Thomas Schwinge > Mentor Email(s): [email protected], [email protected], > [email protected], [email protected] > ________________ > > > Proposal Summary > The problem with OpenMP execution within oversubscribed cloud VMs is > that there is a critical semantic gap between the hypervisor on the > host and the OS within the guest. This means that when a vCPU is > preempted on the host, the OpenMP runtime within the guest is unaware > of this. This leads to millions of CPU cycles being wasted as threads > actively spin at barrier synchronizations, waiting for these Phantom > vCPUs to turn up. This semantic gap is similar in functionality to the > many-to-few problem in scheduling user-space and kernel-space threads > as taught in my course on operating systems, which requires the use of > upcalls to prevent cascading blocks. > The goal of this project is to remove this bottleneck within > oversubscribed cloud VMs, and I will be working on a paravirtualized, > scheduler-informed OpenMP architecture. I will be working on an eBPF > Phantom Tracker using a GCC backend, as well as implementing the > "Juunansei" dynamic policies within libgomp to dynamically adjust the > Degree of Parallelism (DoP) and block threads waiting on preempted > vCPUs. > ________________ > > > Solution Outline > My initial hypothesis goes like this: > 1. eBPF Phantom Tracker: My initial objective is to port the existing > 500+-line in-kernel implementation of Phantom Tracker into an eBPF > program. To achieve this, I propose writing a C program and compiling > it using the GCC compiler’s eBPF backend, setting it up to hook into > two Linux kernel tracepoints: sched_switch and sched_wakeup. These two > events will enable me to compute phantom_average and idle_average > continuously at a granularity determined by the kernel’s scheduling > tick. > 2. Low-Latency Shared Memory Bridge: To enable the guest user space to > dynamically react to events, it is crucial that it can access data > from the host’s eBPF map without incurring performance degradation due > to expensive context switches and system calls. To achieve this > objective, I propose architecting a low latency shared memory bridge > to enable the VM access to data in the host’s eBPF map; my primary > choice is ivshmem – Inter-VM Shared Memory. > 3. libgomp Paravirtualized Policies (Juunansei): I will start my work > inside the libgomp source tree by applying and rebasing the existing > draft patch by Yuao Ma on parsing OMP_WAIT_POLICY inside env.c. Next, > I will replicate the parsing logic to register the new > GOMP_DYNAMIC_POLICY ICV. Then, I will proceed with the implementation > of the paravirtualized runtime logic: > 1. For GOMP_DYNAMIC_POLICY=pvsched: Following the implementation of > the Juunansei algorithm, I will modify the gomp_dynamic_max_threads() > routine. The logic will be as follows: If phantom_average > 0, the DoP > is immediately reduced. If idle cores are available, the DoP is > conservatively increased; i.e., we demand 2 ticks before thrashing is > possible. If we detect phantoms after scaling, we apply a penalty: > stability_requirement *= 2. Additionally, I plan to use the routine > juunansei_cpuset to shrink the thread affinity mask to the range [0, > N-P-1]. > 2. For GOMP_WAIT_POLICY=pvsched: I will modify the routine do_spin. > When the thread reaches the team/dock barrier, the thread will read > the state from shared memory. If the peer is detected as Phantom, the > thread will skip the normal spin-loop routine and call block() > immediately to yield the physical CPU back to the host. > ________________ > > > Project Schedule > I am going to commit to a 30 to 40-hour work week coding period (May > 25 to August 24) to fulfill the requirements for this advanced > project. > Note: I will be participating in a study abroad program in Tokyo with > Arizona State University from May 15 to May 31. I will frontload and > work on my codebase research during the first half of the Community > Bonding period (May 1 to May 14). During Week 1 of the coding period > (May 25 to May 31), I will manage my hours around my program schedule > and synchronize my availability with my mentor's time zone. > Please note that this schedule is tentative and subject to change. > Community Bonding Period (May 1 to May 24) > * Objective: Codebase navigation, environment setup, and architectural > finalization. > * Tasks: > * Clone, build, and debug the GCC toolchain and libgomp locally from > source. > * Set up a hardware-assisted virtualized test environment > (QEMU/KVM) to simulate host-level oversubscription. > * Study the existing ~500-line in-kernel Phantom Tracker to trace > the exact vCPU state calculation logic. > * Finalize the exact shared memory (ivshmem) architecture and eBPF > hook points with my mentors. > Phase 1: eBPF Phantom Tracker (Weeks 1 to 3 | May 25 to June 14) > * Objective: Implement host-level vCPU tracking using GCC's eBPF backend. > * Tasks: > * Write the C program targeting the GCC eBPF backend to hook into > the Linux kernel's sched_switch and sched_wakeup tracepoints. > * Implement the state machine logic to record timestamps of vCPU > preemptions and wakeups. > * Calculate the phantom_average and idle_average metrics at the > granularity of the scheduler tick. > * Validate the tracepoint logic and tick calculations locally using > bpftool. > Phase 2: Low-Latency Shared Memory Bridge (Weeks 4 to 6 | June 15 to July 5) > * Objective: Expose the eBPF host scheduler state to the guest user-space. > * Tasks: > * Architect and implement the ivshmem (Inter-VM Shared Memory) mechanism. > * Map the eBPF map data (containing the phantom and idle averages) > directly into the guest VM's memory space. > * Write a minimal C test program inside the guest to verify > real-time, low-latency reads of the shared memory without invoking > heavy system calls. > Phase 3: libgomp Paravirtualized Policies (Weeks 7 to 9 | July 6 to July 26) > * Objective: Implement the Juunansei dynamic policies inside the OpenMP > runtime. > * Tasks: > * Apply and rebase Yuao Ma's draft patch for OMP_WAIT_POLICY > parsing in env.c, and replicate the logic to register > GOMP_DYNAMIC_POLICY. > * Modify gomp_dynamic_max_threads() for > GOMP_DYNAMIC_POLICY=pvsched: instantly scale down DoP if phantoms > exist, or conservatively scale up, requiring stability over 2 > scheduler ticks. Implement juunansei_cpuset to pack threads on active > vCPUs. > * Rewrite the do_spin primitive for GOMP_WAIT_POLICY=pvsched to > read the shared memory and immediately block() if the target peer is a > flagged Phantom. > Phase 4: Benchmarking & Upstreaming (Weeks 10 to 12 | July 27 to August 17) > * Objective: Prove the performance gain and merge the code. > * Tasks: > * Benchmark the new pvsched policies against the static baseline > (OMP_WAIT_POLICY=passive) using NAS Parallel Benchmarks (BT, CG, FT, > LU, MG, SP, UA) with Class B inputs. > * Simulate host-level oversubscription using a competing "Random > Spinners" workload managed via cgroups. > * Write comprehensive DejaGnu testsuite coverage for the new > environment variables and update the libgomp.texi documentation. > * Format the commits to strictly adhere to GCC's coding style > guidelines and submit the patch series to the gcc-patches mailing > list. > Final Week (August 18 to August 24) > * Buffer week for addressing final patch review comments from the > mailing list, cleaning up documentation, and submitting the final GSoC > evaluation. > ________________ > > > Relevant Experience > Core Languages and Tools: C, C++, Assembly (MIPS), Python, GDB, CMake, > Git, Linux > Projects and Involvement > Systems Programming > I implemented a cycle-accurate MIPS processor with a 5-stage pipeline > in C++23. The implementation includes the hardwired pipeline and > hazard detection logic for load-use stalls and branch delays. I am > currently designing the Coprocessor 0 (CP0) unit, including the Memory > Management Unit (MMU) and Translation Lookaside Buffer (TLB), which > manage the transition of the processor state. Managing precise > processor state transitions and execution hazard logic directly > translates to understanding how vCPUs are scheduled, preempted, and > tracked via Linux kernel tracepoints. This architectural intuition is > exactly what is required to implement the eBPF Phantom Tracker. > Source Code: https://github.com/Souradeep1101/CyclopsMIPS > Compiler Programming > I designed a highly optimized C11 ISO-compliant lexical analyzer from > scratch. I have implemented strict tokenization rules and memory > management at the character level to optimize source code parsing. I > am currently designing the parser and semantic analysis stages, > researching Abstract Syntax Tree optimization techniques to implement > complex C type constraints. This rigorous, low-level approach to > source code data parsing provides the exact foundational architecture > knowledge necessary to navigate the GCC codebase, target the eBPF > backend for the tracker, and safely parse the new GOMP_DYNAMIC_POLICY > ICVs inside libgomp. > Source Code: https://github.com/Souradeep1101/CCompiler > Graphics Programming > For Cyclops Studio, which is a high-frequency 2D animation engine, I > have implemented a rendering pipeline using a CPU/GPU memory arena > that removes allocation costs during batch submission. I have > implemented strict buffer management using direct pointer arithmetic > and memset for Vertex Buffer Object scratchpad management. This > demonstrated ability to implement high-frequency data structures and > raw hardware management with low abstraction perfectly prepares me to > architect the low-latency ivshmem bridge between the host kernel and > guest user-space. > Source Code: https://github.com/Souradeep1101/CyclopsStudio > Competitive Programming > I have attained 2nd Place at the 2025 ICPC North American Qualifier > (ASU Site) and have been a finalist at the ICPC Rocky Mountain > Regional Contest, representing Arizona State University. Apart from > that, I am currently enrolled in an advanced algorithmic seminar with > the ASU ICPC coach. I am also an officer at the ACM student chapter, > where I organize algorithmic workshops known as "ICPC Primer." To do > so, I have had to write extensive logic documentation on complex > problem-solving. This training rigorously prepares me in isolating > edge cases, execution logic, and creating highly optimized, > memory-safe C/C++ code in a time-constrained manner. This algorithmic > rigor is exactly what is needed to implement the low-latency Juunansei > dynamic heuristics (DoP scaling and blocking) directly inside the > libgomp runtime without introducing overhead. > Workshop Documentation & Portfolio: > https://www.souradeep.dev/blog/icpc-spring-2026-overview > > ________________ > > > Commitments and Availability > Academic Status > I am an international student from India, majoring in Computer Science > (Software Engineering) at Arizona State University. I am on an > accelerated track to complete my Bachelor's degree in three years, > with an expected graduation date of May 2027. I am also on an > accelerated track to complete my Master’s degree in one year, with an > expected graduation date of May 2028. > Summer Capacity > I am fully prepared to treat GSoC as my primary summer commitment. I > will dedicate 30 to 40 hours per week to the GCC project for the > duration of the standard coding period to make sure the 350-hour > requirement is comfortably met and exceeded. > Schedule and Tokyo Study Abroad > I will be participating in a study abroad program in Tokyo with > Arizona State University from May 15 to May 31. I will frontload and > work on my codebase research during the first half of the Community > Bonding period (May 1 to May 14). During Week 1 of the coding period > (May 25 to May 31), I will manage my hours around my program schedule > and synchronize my availability with my mentor's time zone.
