Re:Re: Questions about the parallel execution mode and the perf tool

YONG Tue, 13 May 2025 16:18:44 -0700

Hi HongZe,




Thank you very much for your detailed explanation. The background and roadmap 
for this topic are clear to me now.




Have a nice day!




Best Regards

Pan Yong







At 2025-05-13 21:09:34, "Hongze Zhang" <[email protected]> wrote:

>Hi Yong,
>
>Nice topic, thank you for bringing it up. To answer your first question:
>
>> My question is whether we have plan to support parallel mode in future, and 
>> when if it is in the feature list?
>
>We had some discussions[1] about introducing Velox parallel execution
>to Gluten Velox backend, though no progress was made yet. The main
>reason it didn't go forward is, the effort for adopting Velox parallel
>execution could be huge, and the benefit of doing that is uncertain.
>As we know, query execution in vanilla Spark is also always
>parallelized as well, within the in-thread iterator execution model
>and shuffle (reparallization). So it's basically about moving from one
>parallel execution strategy to another.
>
>Nowadays there are also plenty of debates around pull model vs push
>model in the database area, which is similar to the serial vs parallel
>comparison we are talking about here. From my limited perspective,
>these debates haven't led to a clear conclusion either. While we know
>with Velox's parallel execution the query plan could be broken into
>even smaller pipelines, so based on the push model theory so far,
>there might be a chance that query execution could have better
>resource utilization rate. But speaking of integrating that model with
>Spark, that will be another story because it will start from removing
>a bunch of non-trivial engineering blockers. A better reason for
>switching to the parallel model could be it's more Velox-native than
>the serial model we are currently using, because Meta develops Velox
>for replacing Presto's parallel executor from the very beginning.
>However, over time the serial model in Velox is also getting more
>serious usages as well including Gluten itself's use case.
>
>Hence, I think proposals like yours are not totally invalid but the
>community doesn't have a specific plan so far. But research or PoCs
>are definitely welcomed if anyone is interested.
>
>Moreover, IIUC, some folks in the community had some attempts for the
>CH backend around the similar topic, they may also be able to give
>some inputs here.
>
>Hongze
>
>[1] https://github.com/apache/incubator-gluten/issues/7810
>
>On Tue, May 13, 2025 at 1:49 AM YONG <[email protected]> wrote:
>>
>>
>>
>>
>> Sorry. Correct my typo issue below. We use Task::next() now, but not 
>> Task::start().
>>
>>
>>
>>
>> At 2025-05-13 08:43:45, "YONG" <[email protected]> wrote:
>>
>> Hi all,
>>
>>
>> Happy to be here!  I am a newbie in spark and gluten, and have two questions 
>> about gluten to ask.
>>
>>
>> The first question is about the task's execution mode in gluten.
>>
>>
>> From velox's source code, it seems that velox can support two execution 
>> modes [velox/exec/Task.h  enum class ExecutionMode]:
>>       Serial Execution Mode:      which uses single-thread to process the 
>> task, and the API is Task::next()
>>       Parallel Execution Mode:   which uses multi-threads to process the 
>> task, and the API is Task::start()
>>
>>
>> In gluten's code [WholeStageResultIterator::next()],  we only use velox's 
>> serial execution mode [Task::next()] now.
>> I guess maybe velox is developed by Meta to replace presto's engine at 
>> first, and the presto's task can be run in multi-threads. But in Spark, the 
>> task should be run in single-thread, which corresponding to one core in one 
>> executor. I am not sure about the effort to implement velox's parallel mode 
>> in gluten.
>> My question is whether we have plan to support parallel mode in future, and 
>> when if it is in the feature list?
>>
>>
>> The second question is about profiling tool.
>>
>>
>> I want to collect the C++ code's hotspot & Flame Graph in one query, and to 
>> see which function in velox is the cirtical path in my case. I just find the 
>> memo about clickhouse backend 
>> (incubator-gluten/docs/developers/UsingGperftoolsInCH.md at main · 
>> apache/incubator-gluten · GitHub). Is there any memo which I can follow 
>> about velox backend?
>>
>>
>> Thanks a lot
>>
>>
>> Best Regards
>> Pan Yong
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [email protected]
>For additional commands, e-mail: [email protected]

Re:Re: Questions about the parallel execution mode and the perf tool

Reply via email to