RE: [DISCUSS] Actions to avoid the Attic

lisoda Tue, 17 Sep 2024 07:49:11 -0700

Hello.


Although I'm not a member of PMC, I have been a long-term user of TEZ and would 
like to share my opinion about this matter.



First and foremost, looking at all current computing engine-related 
communities, including but not limited to: batch processing engines, MPP 
databases, etc., research on DAG scheduling and execution is essentially 
stagnant. For instance, the update frequency of the DAG scheduling and 
execution module in Apache Spark's spark-core is actually quite low, similar to 
that of TEZ. Therefore, on the surface, the current state of the TEZ project 
appears to be poor, but in reality, I believe this is largely due to a lack of 
enthusiasm for research in the field of DAG scheduling and execution across the 
entire industry. It's just that TEZ is currently the only remaining independent 
open-source framework for studying DAG scheduling and execution, making the 
problem more apparent. Other projects may seem more active, but they are merely 
masking the issue because their main code contributions are not in the field of 
DAG scheduling and execution.


Secondly, is it necessary to continue researching DAG scheduling? I believe it 
is obviously necessary. In fact, we can observe that the workloads in current 
user production environments are increasingly demanding for computing engines. 
Users expect computing engines to have higher efficiency and 
cost-effectiveness. To address these issues, apart from researching indexing 
technologies to enhance data-skip capabilities, the only other option is to 
provide better DAG scheduling and execution capabilities to achieve higher 
benefits. Many of us always think that Spark is very advanced, but in reality, 
as professional engineers, it is not difficult for us to find that Spark does 
not perform very well in DAG scheduling and execution. Compared to TEZ, it 
appears too rudimentary. Perhaps its only current advantage is that its code 
looks relatively neat and elegant. Moreover, many vendors provide "internal 
implementation" versions of DAG scheduling frameworks in their computing 
engine/database products, but from the current perspective, the vast majority 
have not surpassed TEZ. In the long run, it is only a matter of time before the 
industry shifts its research focus back to the field of DAG scheduling and 
execution.



Additionally, regarding the issue of a lack of contributors to TEZ, I 
personally believe the following measures should be taken:

Actively select Project Management Committee (PMC) members from various 
computing engine communities integrated with TEZ to become TEZ-PMC members. 
Since DAG scheduling and execution depend on actual workloads, almost no one 
runs TEZ in isolation. By integrating with other computing engines, PMC members 
from other communities can quickly identify potential issues with TEZ. DAG 
scheduling and execution are relatively abstract and complex matters. Studying 
them in isolation not only has a high barrier to entry but also lacks practical 
use cases, which cannot address existing problems. Furthermore, if some 
research departments or vendors have developed more feature-rich DAG scheduling 
frameworks based on TEZ, we should actively invite them to jointly develop and 
maintain TEZ (because optimizing DAG scheduling and execution is not an easy 
task).

Minimize the selection criteria for contributors/PMC members as much as 
possible. This is because developers who can currently contribute to TEZ are 
essentially users with significant experience in TEZ. The caliber of these 
users is not likely to be poor, so there is no need for excessive screening. 
Attract users to develop and maintain the TEZ project to the greatest extent 
possible. As long as there are more people, the current problems will no longer 
be issues.



That's all.
Tks.


Best
Lisoda

RE: [DISCUSS] Actions to avoid the Attic

Reply via email to