Hi Apache Airavata community, My name is Boyang Gong, and I am participating in Google Summer of Code 2026 with Apache Airavata. I wanted to send an initial update to introduce my participation, share my current project direction, and start a public thread where I can provide potential future progress updates to the community.
My GSoC project page is available here: https://summerofcode.withgoogle.com/programs/2026/projects/13nE0cqE At this stage, my current focus is shifting toward researching and exploring checkpoint/restart support for CPU and GPU. The goal is to understand how running applications, especially GPU NVIDIA workloads, can be checkpointed and later restored so that long-running workloads can continue from a saved state. As an initial literature and technical study, I will be reviewing the following resources: - NVIDIA blog on checkpointing CUDA applications with CRIU: https://developer.nvidia.com/blog/checkpointing-cuda-applications-with-criu/ - CRIUgpu paper: https://arxiv.org/html/2502.16631v1 My current understanding is that CRIU can be used to checkpoint the CPU/process portion of a running application, while NVIDIA CUDA checkpointing support can help handle the GPU/CUDA portion. I will first focus on understanding the basic mechanism and workflow for checkpointing and restoring CUDA applications. After that, I expect to explore how this capability could potentially fit into the broader Airavata ecosystem, including possible future integration with Linkspan. This is my initial plan based on recent mentor discussions, and I expect the details may evolve as I learn more and receive further guidance. I am looking forward to contributing to Apache Airavata and learning from the community. Best regards, Boyang Gong
