Hey Jeremy, Samza will be fine, but at this scale you need to start worrying about Kafka and YARN. 1 million jobs will likely start to put pressure on YARN's RM due to memory usage and CPU usage for the scheduler. With 1 million jobs, assuming 1 container each, you'll have over 1 million connections to Kafka, which means you'll need enough brokers to handle those connections.
Can you describe your use case in more detail? Running 1 million jobs seems like it might be a mis-use of this technology. Cheers, Chris On Wed, Apr 15, 2015 at 10:24 AM, jeremy p <athomewithagroove...@gmail.com> wrote: > What's the maximum number of Samza jobs I can run simultaneously on a > single cluster? Let's say these jobs are very lightweight -- they require > little memory or processing power. However, I need a lot of them -- let's > say I need to have 1,000,000 running at any given time. Is this reasonable > or even possible? >