Hi everyone,

I hope you are doing well.

I have recently started contributing to Apache Wayang and had a few PRs
merged (related to repository updates and cleanup after graduation). I am
very interested in applying for GSoC 2026 with Wayang.

I am particularly interested in the "DataFrame API" project idea, as it
would significantly improve usability and make Wayang more accessible to
users familiar with tabular data abstractions.

I wanted to discuss a potential approach:

   -

   Designing a DataFrame abstraction (schema, rows, columns)
   -

   Supporting operations like select, filter, join, groupBy, aggregation
   -

   Translating these operations into Wayang execution plans
   -

   Ensuring compatibility with the optimizer

Before drafting my full proposal, I would love to get feedback from the
community:

   1.

   Are there any existing discussions or partial implementations around
   this?
   2.

   Are there preferred design directions or constraints I should be aware
   of?
   3.

   Any suggestions on how to scope this project better?

I am also happy to start contributing smaller PRs in this direction.

Looking forward to your guidance and feedback!

Best regards,
Sujay Barui

Reply via email to