Hello.
Although I'm not a member of PMC, I have been a long-term user of TEZ and would like to share my opinion about this matter. First and foremost, looking at all current computing engine-related communities, including but not limited to: batch processing engines, MPP databases, etc., research on DAG scheduling and execution is essentially stagnant. For instance, the update frequency of the DAG scheduling and execution module in Apache Spark's spark-core is actually quite low, similar to that of TEZ. Therefore, on the surface, the current state of the TEZ project appears to be poor, but in reality, I believe this is largely due to a lack of enthusiasm for research in the field of DAG scheduling and execution across the entire industry. It's just that TEZ is currently the only remaining independent open-source framework for studying DAG scheduling and execution, making the problem more apparent. Other projects may seem more active, but they are merely masking the issue because their main code contributions are not in the field of DAG scheduling and execution. Secondly, is it necessary to continue researching DAG scheduling? I believe it is obviously necessary. In fact, we can observe that the workloads in current user production environments are increasingly demanding for computing engines. Users expect computing engines to have higher efficiency and cost-effectiveness. To address these issues, apart from researching indexing technologies to enhance data-skip capabilities, the only other option is to provide better DAG scheduling and execution capabilities to achieve higher benefits. Many of us always think that Spark is very advanced, but in reality, as professional engineers, it is not difficult for us to find that Spark does not perform very well in DAG scheduling and execution. Compared to TEZ, it appears too rudimentary. Perhaps its only current advantage is that its code looks relatively neat and elegant. Moreover, many vendors provide "internal implementation" versions of DAG scheduling frameworks in their computing engine/database products, but from the current perspective, the vast majority have not surpassed TEZ. In the long run, it is only a matter of time before the industry shifts its research focus back to the field of DAG scheduling and execution. Additionally, regarding the issue of a lack of contributors to TEZ, I personally believe the following measures should be taken: Actively select Project Management Committee (PMC) members from various computing engine communities integrated with TEZ to become TEZ-PMC members. Since DAG scheduling and execution depend on actual workloads, almost no one runs TEZ in isolation. By integrating with other computing engines, PMC members from other communities can quickly identify potential issues with TEZ. DAG scheduling and execution are relatively abstract and complex matters. Studying them in isolation not only has a high barrier to entry but also lacks practical use cases, which cannot address existing problems. Furthermore, if some research departments or vendors have developed more feature-rich DAG scheduling frameworks based on TEZ, we should actively invite them to jointly develop and maintain TEZ (because optimizing DAG scheduling and execution is not an easy task). Minimize the selection criteria for contributors/PMC members as much as possible. This is because developers who can currently contribute to TEZ are essentially users with significant experience in TEZ. The caliber of these users is not likely to be poor, so there is no need for excessive screening. Attract users to develop and maintain the TEZ project to the greatest extent possible. As long as there are more people, the current problems will no longer be issues. That's all. Tks. Best Lisoda