Re: Max number of streams supported ?
Thanks Michael, TD for quick reply. It was helpful. I will let you know the numbers(limit) based on my experiments. On Wed, Jan 31, 2018 at 3:10 PM, Tathagata Daswrote: > Just to clarify a subtle difference between DStreams and Structured > Streaming. Multiple input streams in a DStreamGraph is likely to mean they > are all being processed/computed in the same way as there can be only one > streaming query / context active in the StreamingContext. However, in the > case of Structured Streaming, there can be any number of independent > streaming queries (i.e. different computations), and each streaming query > with any number if separate input sources. So Michael's comment of "each > stream will have a thread on the driver" is correct when there are many > independent queries with different computations simultaneously running. > However if all your streams need to be processed in the same way, then its > one streaming query with many inputs, and will require one thread. > > Hope this helps. > > TD > > On Wed, Jan 31, 2018 at 12:39 PM, Michael Armbrust > wrote: > >> -dev +user >> >> >>> Similarly for structured streaming, Would there be any limit on number >>> of of streaming sources I can have ? >>> >> >> There is no fundamental limit, but each stream will have a thread on the >> driver that is doing coordination of execution. We comfortably run 20+ >> streams on a single cluster in production, but I have not pushed the >> limits. You'd want to test with your specific application. >> > >
Re: Max number of streams supported ?
Just to clarify a subtle difference between DStreams and Structured Streaming. Multiple input streams in a DStreamGraph is likely to mean they are all being processed/computed in the same way as there can be only one streaming query / context active in the StreamingContext. However, in the case of Structured Streaming, there can be any number of independent streaming queries (i.e. different computations), and each streaming query with any number if separate input sources. So Michael's comment of "each stream will have a thread on the driver" is correct when there are many independent queries with different computations simultaneously running. However if all your streams need to be processed in the same way, then its one streaming query with many inputs, and will require one thread. Hope this helps. TD On Wed, Jan 31, 2018 at 12:39 PM, Michael Armbrustwrote: > -dev +user > > >> Similarly for structured streaming, Would there be any limit on number of >> of streaming sources I can have ? >> > > There is no fundamental limit, but each stream will have a thread on the > driver that is doing coordination of execution. We comfortably run 20+ > streams on a single cluster in production, but I have not pushed the > limits. You'd want to test with your specific application. >
Re: Max number of streams supported ?
-dev +user > Similarly for structured streaming, Would there be any limit on number of > of streaming sources I can have ? > There is no fundamental limit, but each stream will have a thread on the driver that is doing coordination of execution. We comfortably run 20+ streams on a single cluster in production, but I have not pushed the limits. You'd want to test with your specific application.