After looking at few stack traces I think that in the benchmark application operators compete for the circular buffer that passes slices from the emitter output to the consumer input and sleeps that avoid busy wait are too long for the benchmark operators. I don't see the stack similar to the one below all the time I take the threads dump, but still quite often to suspect that sleep is the root cause. I'll recompile with smaller sleep time and see how this will affect performance.

----
"1/wordGenerator:RandomWordInputModule" prio=10 tid=0x00007f78c8b8c000 nid=0x780f waiting on condition [0x00007f78abb17000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
at com.datatorrent.netlet.util.CircularBuffer.put(CircularBuffer.java:182)
    at com.datatorrent.stram.stream.InlineStream.put(InlineStream.java:79)
    at com.datatorrent.stram.stream.MuxStream.put(MuxStream.java:117)
at com.datatorrent.api.DefaultOutputPort.emit(DefaultOutputPort.java:48) at com.datatorrent.benchmark.RandomWordInputModule.emitTuples(RandomWordInputModule.java:108)
    at com.datatorrent.stram.engine.InputNode.run(InputNode.java:115)
at com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1377)

"2/counter:WordCountOperator" prio=10 tid=0x00007f78c8c98800 nid=0x780d waiting on condition [0x00007f78abc18000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:519)
at com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1377)

----

On 9/26/15 20:59, Amol Kekre wrote:
A good read -
http://preshing.com/20111118/locks-arent-slow-lock-contention-is/

Though it does not explain order of magnitude difference.

Amol


On Sat, Sep 26, 2015 at 4:25 PM, Vlad Rozov <[email protected]> wrote:

In the benchmark test THREAD_LOCAL outperforms CONTAINER_LOCAL by an order
of magnitude and both operators compete for CPU. I'll take a closer look
why.

Thank you,

Vlad


On 9/26/15 14:52, Thomas Weise wrote:

THREAD_LOCAL - operators share thread
CONTAINER_LOCAL - each operator has its own thread

So as long as operators utilize the CPU sufficiently (compete), the latter
will perform better.

There will be cases where a single thread can accommodate multiple
operators. For example, a socket reader (mostly waiting for IO) and a
decompress (CPU hungry) can share a thread.

But to get back to the original question, stream locality does generally
not reduce the total memory requirement. If you add multiple operators
into
one container, that container will also require more memory and that's how
the container size is calculated in the physical plan. You may get some
extra mileage when multiple operators share the same heap but the need to
identify the memory requirement per operator does not go away.

Thomas


On Sat, Sep 26, 2015 at 12:41 PM, Munagala Ramanath <[email protected]>
wrote:

Would CONTAINER_LOCAL achieve the same thing and perform a little better
on
a multi-core box ?

Ram

On Sat, Sep 26, 2015 at 12:18 PM, Sandeep Deshmukh <
[email protected]>
wrote:

Yes, with this approach only two containers are required: one for stram
and

another for all operators. You can easily fit around 10 operators in
less
than 1GB.
On 27 Sep 2015 00:32, "Timothy Farkas" <[email protected]> wrote:

Hi Ram,
You could make all the operators thread local. This cuts down on the
overhead of separate containers and maximizes the memory available to

each

operator.

Tim

On Sat, Sep 26, 2015 at 10:07 AM, Munagala Ramanath <

[email protected]
wrote:
   Hi,
I was running into memory issues when deploying my  app on the

sandbox
where all the operators were stuck forever in the PENDING state
because
they were being continually aborted and restarted because of the
limited
memory on the sandbox. After some experimentation, I found that the
following config values seem to work:
------------------------------------------
<

https://datatorrent.slack.com/archives/engineering/p1443263607000010>
*<property>    <name>dt.attr.MASTER_MEMORY_MB</name>

<value>500</value>

   </property>  <property>    <name>dt.application.​.operator.*





*​.attr.MEMORY_MB</name>    <value>200</value>  </property>

<property>
<name>dt.application.TopNWordsWithQueries.operator.fileWordCount.attr.MEMORY_MB</name>

     <value>512</value>  </property>*
------------------------------------------------
Are these reasonable values ? Is there a more systematic way of

coming
up

with these values than trial-and-error ? Most of my operators -- with
the
exception of fileWordCount -- need very little memory; is there a way
to
cut all values down to the bare minimum and maximize available memory
for
this one operator ?
Thanks.

Ram



Reply via email to