Hi everyone, I'd like to propose adding a new API to GraphAr that enables resource-efficient data retrieval for GNN (graph neural network) training workloads.
The motivation comes from recent work <https://arxiv.org/pdf/2411.11375> on scaling GNN training via graph databases. Their approach demonstrates memory savings but shows some bottlenecks (e.g., result conversion overhead). This proposal takes the same core idea and implements it as optimized operations directly at the GraphAr layer. I've discussed this idea with a few PPMC members, including Sem Sinchenko, who has agreed to shepherd the proposal. The full proposal document is here <https://docs.google.com/document/d/1oLShCWa9s__OItmwORglzm4oQoig4lzqJQZ1ZcpiEZY/edit?usp=sharing>. I would appreciate any feedback on the scope, approach, and design direction. Thanks, Iskander Fakhrutdinov
