Re: [DISCUSS] Actions to avoid the Attic

lisoda Tue, 17 Sep 2024 09:48:50 -0700

Yea,sir.I concur with your viewpoint. However, the current scenario is that the 
code review process for the Tez project is sluggish, and there is a severe lack 
of personnel. Moreover, since many Tez issues stem from the Hive project, a 
number of Hive project maintainers also possess development experience with 
Tez. I personally don't see any significant issues with the initiative to 
select suitable individuals from this group to join the Tez community as 
maintainers. After all, these individuals have indeed contributed code to Tez 
while improving Hive (similar situations exist in other communities, such as 
Pig). At the very least, we should augment our workforce to expedite the code 
review process. If the code review for user contributions is consistently 
delayed, it can also severely dampen the developers' enthusiasm for 
contributing code. However, in any case, this is just my personal opinion. 
Thank you for your response.




---- Replied Message ----
| From | Ayush Saxena<ayush...@gmail.com> |
| Date | 09/17/2024 23:54 |
| To | dev@tez.apache.org |
| Cc | |
| Subject | Re: [DISCUSS] Actions to avoid the Attic |
Just to clarify, the release was delayed because the release candidate (RC)
didn’t receive enough votes in time. Fortunately, this issue was resolved,
and we successfully released it with more than the required votes. I want
to make sure this thread doesn’t discourage any contributors or users of
the project (like myself). To be clear: Tez isn’t going anywhere, things
are under control, and for the record, we’ve had our second release of 2024.

On the topic of getting contributions from individuals with different
backgrounds, query engines, or projects that use Tez—there’s no need to be
on the PMC to contribute or share ideas. If anyone has something to offer,
feel free to start a new thread or create tickets. There are always folks
willing to help. As far as I know, merit in one Apache project doesn’t
automatically transfer to another, and if someone feels they can only
contribute if they’re on the PMC, that’s not a sustainable approach for any
project.

Regarding developers who contribute to Tez being mainly experienced users:
> the quality of their contributions should be high.


In my understanding, if it didn’t happen on the mailing list, it didn’t
happen. Trusting someone due to personal or professional connections
shouldn’t bypass the contributions made by actual project contributors.
Merit must be established in a way that’s recognized by the entire PMC in
line with how the Apache ecosystem operates. It’s not just about code but
community involvement too. CommunityOverCode :-)

Thanks to everyone for their suggestions. We managed to get the release
out, so I don’t think inactive PMC is an issue, which could have pushed us
toward the Attic. What we need now is to attract more contributions and
focus our energy on bugs, improvements, and designs. Everyone is welcome to
participate!

-Ayush
(A happy "Apache" Tez Contributor)


On Tue, 17 Sept 2024 at 20:20, lisoda <lis...@yeah.net> wrote:

> Hello.
>
>
> Although I'm not a member of PMC, I have been a long-term user of TEZ and
> would like to share my opinion about this matter.
>
>
>
> First and foremost, looking at all current computing engine-related
> communities, including but not limited to: batch processing engines, MPP
> databases, etc., research on DAG scheduling and execution is essentially
> stagnant. For instance, the update frequency of the DAG scheduling and
> execution module in Apache Spark's spark-core is actually quite low,
> similar to that of TEZ. Therefore, on the surface, the current state of the
> TEZ project appears to be poor, but in reality, I believe this is largely
> due to a lack of enthusiasm for research in the field of DAG scheduling and
> execution across the entire industry. It's just that TEZ is currently the
> only remaining independent open-source framework for studying DAG
> scheduling and execution, making the problem more apparent. Other projects
> may seem more active, but they are merely masking the issue because their
> main code contributions are not in the field of DAG scheduling and
> execution.
>
>
> Secondly, is it necessary to continue researching DAG scheduling? I
> believe it is obviously necessary. In fact, we can observe that the
> workloads in current user production environments are increasingly
> demanding for computing engines. Users expect computing engines to have
> higher efficiency and cost-effectiveness. To address these issues, apart
> from researching indexing technologies to enhance data-skip capabilities,
> the only other option is to provide better DAG scheduling and execution
> capabilities to achieve higher benefits. Many of us always think that Spark
> is very advanced, but in reality, as professional engineers, it is not
> difficult for us to find that Spark does not perform very well in DAG
> scheduling and execution. Compared to TEZ, it appears too rudimentary.
> Perhaps its only current advantage is that its code looks relatively neat
> and elegant. Moreover, many vendors provide "internal implementation"
> versions of DAG scheduling frameworks in their computing engine/database
> products, but from the current perspective, the vast majority have not
> surpassed TEZ. In the long run, it is only a matter of time before the
> industry shifts its research focus back to the field of DAG scheduling and
> execution.
>
>
>
> Additionally, regarding the issue of a lack of contributors to TEZ, I
> personally believe the following measures should be taken:
>
> Actively select Project Management Committee (PMC) members from various
> computing engine communities integrated with TEZ to become TEZ-PMC members.
> Since DAG scheduling and execution depend on actual workloads, almost no
> one runs TEZ in isolation. By integrating with other computing engines, PMC
> members from other communities can quickly identify potential issues with
> TEZ. DAG scheduling and execution are relatively abstract and complex
> matters. Studying them in isolation not only has a high barrier to entry
> but also lacks practical use cases, which cannot address existing problems.
> Furthermore, if some research departments or vendors have developed more
> feature-rich DAG scheduling frameworks based on TEZ, we should actively
> invite them to jointly develop and maintain TEZ (because optimizing DAG
> scheduling and execution is not an easy task).
>
> Minimize the selection criteria for contributors/PMC members as much as
> possible. This is because developers who can currently contribute to TEZ
> are essentially users with significant experience in TEZ. The caliber of
> these users is not likely to be poor, so there is no need for excessive
> screening. Attract users to develop and maintain the TEZ project to the
> greatest extent possible. As long as there are more people, the current
> problems will no longer be issues.
>
>
>
> That's all.
> Tks.
>
>
> Best
> Lisoda

Re: [DISCUSS] Actions to avoid the Attic

Reply via email to