Hi HongZe,
Thank you very much for your detailed explanation. The background and roadmap for this topic are clear to me now. Have a nice day! Best Regards Pan Yong At 2025-05-13 21:09:34, "Hongze Zhang" <[email protected]> wrote: >Hi Yong, > >Nice topic, thank you for bringing it up. To answer your first question: > >> My question is whether we have plan to support parallel mode in future, and >> when if it is in the feature list? > >We had some discussions[1] about introducing Velox parallel execution >to Gluten Velox backend, though no progress was made yet. The main >reason it didn't go forward is, the effort for adopting Velox parallel >execution could be huge, and the benefit of doing that is uncertain. >As we know, query execution in vanilla Spark is also always >parallelized as well, within the in-thread iterator execution model >and shuffle (reparallization). So it's basically about moving from one >parallel execution strategy to another. > >Nowadays there are also plenty of debates around pull model vs push >model in the database area, which is similar to the serial vs parallel >comparison we are talking about here. From my limited perspective, >these debates haven't led to a clear conclusion either. While we know >with Velox's parallel execution the query plan could be broken into >even smaller pipelines, so based on the push model theory so far, >there might be a chance that query execution could have better >resource utilization rate. But speaking of integrating that model with >Spark, that will be another story because it will start from removing >a bunch of non-trivial engineering blockers. A better reason for >switching to the parallel model could be it's more Velox-native than >the serial model we are currently using, because Meta develops Velox >for replacing Presto's parallel executor from the very beginning. >However, over time the serial model in Velox is also getting more >serious usages as well including Gluten itself's use case. > >Hence, I think proposals like yours are not totally invalid but the >community doesn't have a specific plan so far. But research or PoCs >are definitely welcomed if anyone is interested. > >Moreover, IIUC, some folks in the community had some attempts for the >CH backend around the similar topic, they may also be able to give >some inputs here. > >Hongze > >[1] https://github.com/apache/incubator-gluten/issues/7810 > >On Tue, May 13, 2025 at 1:49 AM YONG <[email protected]> wrote: >> >> >> >> >> Sorry. Correct my typo issue below. We use Task::next() now, but not >> Task::start(). >> >> >> >> >> At 2025-05-13 08:43:45, "YONG" <[email protected]> wrote: >> >> Hi all, >> >> >> Happy to be here! I am a newbie in spark and gluten, and have two questions >> about gluten to ask. >> >> >> The first question is about the task's execution mode in gluten. >> >> >> From velox's source code, it seems that velox can support two execution >> modes [velox/exec/Task.h enum class ExecutionMode]: >> Serial Execution Mode: which uses single-thread to process the >> task, and the API is Task::next() >> Parallel Execution Mode: which uses multi-threads to process the >> task, and the API is Task::start() >> >> >> In gluten's code [WholeStageResultIterator::next()], we only use velox's >> serial execution mode [Task::next()] now. >> I guess maybe velox is developed by Meta to replace presto's engine at >> first, and the presto's task can be run in multi-threads. But in Spark, the >> task should be run in single-thread, which corresponding to one core in one >> executor. I am not sure about the effort to implement velox's parallel mode >> in gluten. >> My question is whether we have plan to support parallel mode in future, and >> when if it is in the feature list? >> >> >> The second question is about profiling tool. >> >> >> I want to collect the C++ code's hotspot & Flame Graph in one query, and to >> see which function in velox is the cirtical path in my case. I just find the >> memo about clickhouse backend >> (incubator-gluten/docs/developers/UsingGperftoolsInCH.md at main · >> apache/incubator-gluten · GitHub). Is there any memo which I can follow >> about velox backend? >> >> >> Thanks a lot >> >> >> Best Regards >> Pan Yong >> > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [email protected] >For additional commands, e-mail: [email protected]
