Listeners are the best resources to the allocation manager afaik... It already has SparkListener <https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L640> that it utilizes. We can use it to extract more information (like processing times). The one with more information regarding streaming query resides in sql module <https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala> though.
Thanks Pavan On Tue, Aug 8, 2023 at 5:43 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi Pavan or anyone else > > Is there any way one access the matrix displayed on SparkGUI? For example > the readings for processing time? Can these be acessed? > > Thanks > > For example, > Mich Talebzadeh, > Solutions Architect/Engineering Lead > London > United Kingdom > > > view my Linkedin profile > <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!d-qX4RylsnHucGkE4OdsO8agaKMFV59tVQnWZL1FbbZLVLWVUWgWmiiKC1Mvyy-796X-uP5XZfjLEbrVfe771d6VrCySTg$> > > > https://en.everybodywiki.com/Mich_Talebzadeh > <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!d-qX4RylsnHucGkE4OdsO8agaKMFV59tVQnWZL1FbbZLVLWVUWgWmiiKC1Mvyy-796X-uP5XZfjLEbrVfe771d4r4xOqSg$> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Tue, 8 Aug 2023 at 06:44, Pavan Kotikalapudi <pkotikalap...@twilio.com> > wrote: > >> Thanks for the review Mich, >> >> Yes, the configuration parameters we end up setting would be based on the >> trigger interval. >> >> > If you are going to have additional indicators why not look at >> scheduling delay as well >> Yes. The implementation is based on scheduling delays, not for pending >> tasks of the current stage but rather pending tasks of all the stages in >> a micro-batch >> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352/files*diff-fdddb0421641035be18233c212f0e3ccd2d6a49d345bd0cd4eac08fc4d911e21R1025__;Iw!!NCc8flgU!d-qX4RylsnHucGkE4OdsO8agaKMFV59tVQnWZL1FbbZLVLWVUWgWmiiKC1Mvyy-796X-uP5XZfjLEbrVfe771d6feoFH2Q$> >> (hence >> trigger interval). >> >> > we ought to utilise the historical statistics collected under the >> checkpointing directory to get more accurate statistics >> You are right! This is just a simple implementation based on one factor, >> we should also look into other indicators as well If that would help build >> a better scaling algorithm. >> >> Thank you, >> >> Pavan >> >> On Mon, Aug 7, 2023 at 9:55 PM Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I glanced over the design doc. >>> >>> You are providing certain configuration parameters plus some settings >>> based on static values. For example: >>> >>> spark.dynamicAllocation.schedulerBacklogTimeout": 54s >>> >>> I cannot see any use of <processing time> which ought to be at least >>> half of the batch interval to have the correct margins (confidence level). >>> If >>> you are going to have additional indicators why not look at scheduling >>> delay as well. Moreover most of the needed statistics are also available to >>> set accurate values. My inclination is that this is a great effort but >>> we ought to utilise the historical statistics collected under >>> checkpointing directory to get more accurate statistics. I will review >>> the design document in duew course >>> >>> HTH >>> >>> Mich Talebzadeh, >>> Solutions Architect/Engineering Lead >>> London >>> United Kingdom >>> >>> >>> view my Linkedin profile >>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLFr9YSZnw$> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLEPx44C1w$> >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Tue, 8 Aug 2023 at 01:30, Pavan Kotikalapudi >>> <pkotikalap...@twilio.com.invalid> wrote: >>> >>>> Hi Spark Dev, >>>> >>>> I have extended traditional DRA to work for structured streaming >>>> use-case. >>>> >>>> Here is an initial Implementation draft PR >>>> https://github.com/apache/spark/pull/42352 >>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLHLe7WCUw$> >>>> and >>>> design doc: >>>> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing >>>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLFAjJfilg$> >>>> >>>> Please review and let me know what you think. >>>> >>>> Thank you, >>>> >>>> Pavan >>>> >>>