Hey Frank, thanks for reaching out, I'm glad to hear you're interested
in this project! I'll try to answer your questions below.

> As I understand it, ManageIO serves as an abstraction layer between the
Runner (pipeline) and the Connector, facilitating dynamic configuration,
data access, and connector resource management. However, I am still
uncertain about the specific responsibilities and boundaries of ManageIO.
Could you please provide me with more precise and detailed requirements?

I'll try my best :)

Today, Beam has many IOs, some of which have been around for a long time.
As a result, since those IOs have been introduced, Beam has introduced a
number of features/best practices, and the IOs themselves have introduced
many features/best practices. Some of these best practices are incompatible
with the current IO setup, and ManagedIO is an attempt to introduce a
single transform which can easily be configured to read/write to different
sources/sinks and which complies with these best practices.

The main requirements, then, are:

1) ManagedIO should support all recommended read/write configurations as
our existing IOs. This code can hopefully be mostly reused, but we will
want to avoid exposing some deprecated configurations
2) ManagedIO writes should support dynamic destinations (writing to a
different location based on a user defined key - example
<https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinations.html>
)
3) ManagedIO should get schema'd input and schema'd output
<https://beam.apache.org/documentation/programming-guide/#schemas> from
every IO so that it can easily be used in cross language transforms (this
will also unlock other future advantages like the ability to easily upgrade
a transform without upgrading the whole pipeline).
--- 3a) To make it easy to upgrade from existing transforms, a set of
adaptors should be provided to convert data types to/from the current IO
input/output types to/from schemas.
4) ManagedIO should support a set of standard metrics for every IO (to be
defined, but will be defined before the project would begin).
5) ManagedIO should be able to drive the transform with language
independent configuration (aka we should avoid language specific user
defined functions).
6) Every IO for ManagedIO should be well tested with integration tests and
well documented with examples.

There may be some additional smaller requirements that emerge as we start
to build out the core transform, but I think that covers the main pieces.
It's also ok if you don't understand every piece of this yet, there will be
some time to go over all of these requirements in depth if your proposal is
accepted. The high level goal here, though, is to port existing IOs to use
best practices in a single, configurable way.

> I have reviewed some of the source code and realized that this is indeed
a highly challenging project involving extensive interactions between
numerous pieces of code. Therefore, I would like to inquire, whether the
Beam community development team already has a tentative implementation
plan. If such a plan exists, I am fully committed to contributing to its
realization. If not, I would be honored to discuss potential strategies and
learn design approaches alongside my esteemed seniors.

There's not a fully fleshed out plan yet, but I expect that by the time the
GSOC project would start (by the end of May), there will be a more concrete
plan in place, with one or more IOs already onboarded to ManagedIO. One of
our goals here will be to reuse as much of our existing code as we can
while still complying with the ManagedIO requirements, so we also shouldn't
be starting from nothing.

> Has the project already attracted other candidates who might be more
suitable?

No, I think you're the first person to ask about it here, and I'd
definitely encourage you to apply!

------------------------------------------------------------------------------------------------------------------------------------

Thanks again for your interest in the project, and please let me know if
you have any more questions!

Thanks,
Danny

On Sat, Feb 24, 2024 at 11:10 AM 2844167...@qq.co <juufvug...@foxmail.com>
wrote:

> Dear Mentor,
>
> Hello! I am Frank, a junior student from China, highly interested in
> participating in the development of the GSoC Apache Beam community project
> under your guidance. I am keen to learn about the development requirements
> for ManageIO within the project and would greatly appreciate any assistance
> you could provide. Thank you very much!
>
>    - As I understand it, ManageIO serves as an abstraction layer between
>    the Runner (pipeline) and the Connector, facilitating dynamic
>    configuration, data access, and connector resource management. However, I
>    am still uncertain about the specific responsibilities and boundaries of
>    ManageIO. Could you please provide me with more precise and detailed
>    requirements?
>    - I have reviewed some of the source code and realized that this is
>    indeed a highly challenging project involving extensive interactions
>    between numerous pieces of code. Therefore, I would like to inquire,
>    whether the Beam community development team already has a tentative
>    implementation plan. If such a plan exists, I am fully committed to
>    contributing to its realization. If not, I would be honored to discuss
>    potential strategies and learn design approaches alongside my esteemed
>    seniors.
>    - Has the project already attracted other candidates who might be more
>    suitable?
>
> Below is a brief introduction about myself:
>
> I am a junior student at Fuzhou University with three years of Java
> development experience, a passion for reading source code, and a firm
> believer in the open-source spirit. I am familiar with the source code of
> major projects such as Netty, Spring, and Disruptor and possess a solid
> foundation in object-oriented programming and abstract thinking. I have
> participated in several open-source activities in China and have gained
> some experience in the open-source community.
>
>

Reply via email to