In addition to that:
For now some stateful operations in structured streaming don't have equivalent
python API, e.g. flatMapGroupsWithState. However spark engineers are making it
possible in the upcoming version. See more:
This is exactly what we ended up doing! The only drawback I saw with this
approach is that the GPU tasks get pretty big (in terms of data and compute
time), and task failures become expansive. That's why I reached out to the
mailing list in the first place
Normally I try to aim for anything
Now I see what you want to do. If you have access to the cluster
configuration files, you can modify the spark-env.sh file on the worker
nodes to specify exactly which node you'd like to link with GPU cores
and which one not. This would allow only those nodes configured with
GPU-resources
Stage level scheduling does not allow you to change configs right now. This is
something we thought about as follow on but have never implemented. How many
tasks on the DL stage are you running? The typical case is run some etl lots
of tasks... do mapPartitions and then run your DL stuff,
Er, wait, this is what stage-level scheduling is right? this has existed
since 3.1
https://issues.apache.org/jira/browse/SPARK-27495
On Thu, Nov 3, 2022 at 12:10 PM bo yang wrote:
> Interesting discussion here, looks like Spark does not support configuring
> different number of executors in
Interesting discussion here, looks like Spark does not support configuring
different number of executors in different stages. Would love to see the
community come out such a feature.
On Thu, Nov 3, 2022 at 9:10 AM Shay Elbaz wrote:
> Thanks again Artemis, I really appreciate it. I have watched
Thanks again Artemis, I really appreciate it. I have watched the video but did
not find an answer.
Please bear with me just one more iteration
Maybe I'll be more specific:
Suppose I start the application with maxExecutors=500, executors.cores=2,
because that's the amount of resources needed
Shay, You may find this video helpful (with some API code samples that
you are looking for).
https://www.youtube.com/watch?v=JNQu-226wUc=171s. The issue here
isn't how to limit the number of executors but to request for the right
GPU-enabled executors dynamically. Those executors used in
Well your mileage varies so to speak.
- Spark itself is written in Scala. However, that does not imply you
should stick with Scala.
- I have used both for spark streaming and spark structured streaming,
they both work fine
- PySpark has become popular with the widespread use of
Unsubscribe
unsubscribe
--
Best Regards,
- Huajian
Thanks Artemis. We are not using Rapids, but rather using GPUs through the
Stage Level Scheduling feature with ResourceProfile. In Kubernetes you have to
turn on shuffle tracking for dynamic allocation, anyhow.
The question is how we can limit the number of executors when building a new
12 matches
Mail list logo