Hi Seweryn,

It's a little hard to say. For one thing, extra threads have some overhead
of their own, but I agree with you that the bulk of the extra memory would
come from the extra throughput you're able to drive through the application.

I haven't done any analysis of this before, so just reasoning about this
(as opposed to speaking from experience):

In the maximum case, doubling your thread count would double your memory
usage. This is for an "ideal" CPU-bound process. In reality, there are
shared resources, such as network and disk, that should prevent you from
reaching this bound.

In the minimum case, if the app is already saturating some other resource,
like network, disk, or even memory, then increasing the thread count would
not add an appreciable amount of memory. The reason is that if the app is
saturating, say, the network already, then more threads doesn't change that
fact, and you still can't increase the throughput.

As far as a concrete answer to your question, I think you're unfortunately
the only one with enough visibility to predict the memory load. It would be
very dependent on your machines, network, the number of topics and
partitions, the size of your records in each partition, what exactly your
Streams app does, and even your broker configuration.

However, I'd propose the following experimental strategy to try and get a
handle on it:
1. start with one thread. Observe all the main resources (CPU, network i/o,
disk i/o), but especially memory. For memory, pay particular attention to
the memory used immediately after GC. You might want to turn on GC logging
to help with this.
1b. observe these metrics for long enough for a stable trend to emerge.
This might be hours or even a day.
2. add one more thread. Continue observing all the resources. As I said, in
the ideal case, this should double your throughput and hence double your
memory usage. Looking at how much all the extra metrics increase when you
add the second thread should help you start building a model of the
increase you should expect for each extra thread.
3. continue the experiment, adding one thread each time. At some point,
you'll notice that the throughput/memory increase drops off when you add an
extra thread. This means that you've saturated one or more other resource.
The metrics for those resources should corroborate this.

Note that, if nothing else, the CPU should become saturated once the number
of threads is equal to the number of cores. Increasing the thread count
much beyond this shouldn't help much.

I hope this helps!

On Fri, Aug 31, 2018 at 1:02 AM Seweryn Habdank-Wojewodzki (JIRA) <
j...@apache.org> wrote:

> Seweryn Habdank-Wojewodzki created KAFKA-7363:
> -------------------------------------------------
>
>              Summary: How num.stream.threads in streaming application
> influence memory consumption?
>                  Key: KAFKA-7363
>                  URL: https://issues.apache.org/jira/browse/KAFKA-7363
>              Project: Kafka
>           Issue Type: Task
>             Reporter: Seweryn Habdank-Wojewodzki
>
>
> Dears,
>
> How option _num.stream.threads_ in streaming application influence memory
> consumption?
> I see that by increasing num.stream.threads my application needs more
> memory.
> This is obvious, but it is not obvious how much I need to give it. Try and
> error method does not work, as it seems to be highly dependen on forced
> throughput.
> I mean: higher load more memory is needed.
>
> Thanks for help and regards,
> Seweryn.
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>

Reply via email to