Re: Thread and Container locality

Chetan Narsude Tue, 29 Sep 2015 16:58:44 -0700

Vlad, what was the number without this fix?

--
Chetan


On Tue, Sep 29, 2015 at 4:48 PM, Vlad Rozov <[email protected]> wrote:

> I did a quick prototype that uses http://jctools.github.io/JCTools SPSC
> bounded queue instead of CircularBuffer. For container local I now see 13
> mil tuples per second.
>
> Thank you,
>
> Vlad <http://jctools.github.io/JCTools>
>
>
> On 9/28/15 12:58, Chetan Narsude wrote:
>
>> Let me shed some light on THREAD_LOCAL and CONTAINER_LOCAL.
>>
>> THREAD_LOCAL at the core is nothing but a function call. When an operator
>> does emit(tuple), it gets translated in  downstream ports "process(tuple)"
>> call which immediately gets invoked in the same thread. So obviously the
>> performance is going to be a lot faster. The only thing that's happening
>> in
>> between is setting up the stack and invoking the function.
>>
>> With CONTAINER_LOCAL - there is a producer thread and  a consumer thread
>> involved. Producer produces (emit(tuple)) and consumer
>> consumes(process(tuple)). This scheme is the most optimal when the rate at
>> which producer produces is equal to the rate at which consumer consumes.
>> Often that's not the case - so we have a bounded memory buffer in between
>> (the implementation is CircularBuffer). Now in addition to the things that
>> THREAD_LOCAL does, CONTAINER_LOCAL pattern requires managing the circular
>> buffer *and* thread context switch. The most expensive of the thread
>> context switch is the memory synchronization. As you all have pointed out
>> how expensive it is to use volatile, I need not get into details of how
>> expensive memory synchronization can get.
>>
>> Long story short - no matter which pattern you use, when you use more than
>> 1 thread there are certain memory synchronization penalties which are
>> unavoidable and slow the things down considerably. In 2012, I had
>> benchmarked atomic, volatile, synchronized and for the benchmark (I think
>> there are unit tests for it), I found volatile to be least expensive at
>> that time. Synchronized was not too much behind (it's very efficient when
>> the contention is likely to be amongst a single digit number of threads).
>> Not sure how those benchmark will look today but you get the idea.
>>
>> In a data intensive app, most of the time is spent in IO and there is a
>> lot
>> of CPU idling at individual operator so you will not see the difference
>> when you change CONTAINER_LOCAL to THREAD_LOCAL yet you will see some
>> memory optimization as you are taking away intermediate memory based
>> buffer
>> *and* delayed garbage collection of the objects held by this buffer.
>>
>> Recommendation: Do not bother with these micro optimizations unless you
>> notice a problem. Use THREAD_LOCAL for processing
>> low-throughput/infrequent
>> streams. Use CONTAINER_LOCAL to avoid serialization/deserialization of
>> objects. Leave the rest to the platform. I expect that as it matures it
>> will make most of these decisions automatically.
>>
>> HTH.
>>
>> --
>> Chetan
>>
>> On Mon, Sep 28, 2015 at 11:44 AM, Vlad Rozov <[email protected]>
>> wrote:
>>
>> Hi Tim,
>>>
>>> I use benchmark application that is part of Apache Malhar project. Please
>>> let me know if you need help with compiling or running the application.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On 9/28/15 11:09, Timothy Farkas wrote:
>>>
>>> Also sharing a diff
>>>>
>>>>
>>>>
>>>> https://github.com/DataTorrent/Netlet/compare/master...ilooner:condVarBuffer
>>>>
>>>> Thanks,
>>>> Tim
>>>>
>>>> On Mon, Sep 28, 2015 at 10:07 AM, Timothy Farkas <[email protected]>
>>>> wrote:
>>>>
>>>> Hi Vlad,
>>>>
>>>>> Could you share your benchmarking applications? I'd like to test a
>>>>> change
>>>>> I made to the Circular Buffer
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> https://github.com/ilooner/Netlet/blob/condVarBuffer/src/main/java/com/datatorrent/netlet/util/CircularBuffer.java
>>>>>
>>>>> Thanks,
>>>>> Tim
>>>>>
>>>>> On Mon, Sep 28, 2015 at 9:56 AM, Pramod Immaneni <
>>>>> [email protected]
>>>>> wrote:
>>>>>
>>>>> Vlad what was your mode of interaction/ordering between the two threads
>>>>>
>>>>>> for
>>>>>> the 3rd test.
>>>>>>
>>>>>> On Mon, Sep 28, 2015 at 10:51 AM, Vlad Rozov <[email protected]
>>>>>> >
>>>>>> wrote:
>>>>>>
>>>>>> I created a simple test to check how quickly java can count to
>>>>>>
>>>>>>> Integer.MAX_INTEGER. The result that I see is consistent with
>>>>>>> CONTAINER_LOCAL behavior:
>>>>>>>
>>>>>>> counting long in a single thread: 0.9 sec
>>>>>>> counting volatile long in a single thread: 17.7 sec
>>>>>>> counting volatile long shared between two threads: 186.3 sec
>>>>>>>
>>>>>>> I suggest that we look into
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> https://qconsf.com/sf2012/dl/qcon-sanfran-2012/slides/MartinThompson_LockFreeAlgorithmsForUltimatePerformanceMOVEDTOBALLROOMA.pdf
>>>>>>
>>>>>> or similar algorithm.
>>>>>>>
>>>>>>> Thank you,
>>>>>>>
>>>>>>> Vlad
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 9/28/15 08:19, Vlad Rozov wrote:
>>>>>>>
>>>>>>> Ram,
>>>>>>>
>>>>>>>> The stream between operators in case of CONTAINER_LOCAL is
>>>>>>>>
>>>>>>>> InlineStream.
>>>>>>> InlineStream extends DefaultReservoir that extends CircularBuffer.
>>>>>>>
>>>>>>>> CircularBuffer does not use synchronized methods or locks, it uses
>>>>>>>> volatile. I guess that using volatile causes CPU cache invalidation
>>>>>>>> and
>>>>>>>> along with memory locality (in thread local case tuple is always
>>>>>>>> local
>>>>>>>>
>>>>>>>> to
>>>>>>> both threads, while in container local case the second operator
>>>>>>> thread
>>>>>>> may
>>>>>>> see data significantly later after the first thread produced it)
>>>>>>> these
>>>>>>> two
>>>>>>> factors negatively impact CONTAINER_LOCAL performance. It is still
>>>>>>> quite
>>>>>>> surprising that the impact is so significant.
>>>>>>>
>>>>>>>> Thank you,
>>>>>>>>
>>>>>>>> Vlad
>>>>>>>>
>>>>>>>> On 9/27/15 16:45, Munagala Ramanath wrote:
>>>>>>>>
>>>>>>>> Vlad,
>>>>>>>>
>>>>>>>>> That's a fascinating and counter-intuitive result. I wonder if some
>>>>>>>>> internal synchronization is happening
>>>>>>>>> (maybe the stream between them is a shared data structure that is
>>>>>>>>> lock
>>>>>>>>> protected) to
>>>>>>>>> slow down the 2 threads in the CONTAINER_LOCAL case. If they are
>>>>>>>>> both
>>>>>>>>> going as fast as possible
>>>>>>>>> it is likely that they will be frequently blocked by the lock. If
>>>>>>>>> that
>>>>>>>>> is indeed the case, some sort of lock
>>>>>>>>> striping or a near-lockless protocol for stream access should tilt
>>>>>>>>> the
>>>>>>>>> balance in favor of CONTAINER_LOCAL.
>>>>>>>>>
>>>>>>>>> In the thread-local case of course there is no need for such
>>>>>>>>> locking.
>>>>>>>>>
>>>>>>>>> Ram
>>>>>>>>>
>>>>>>>>> On Sun, Sep 27, 2015 at 12:17 PM, Vlad Rozov <
>>>>>>>>> [email protected]
>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>
>>>>>>>>>       Changed subject to reflect shift of discussion.
>>>>>>>>>
>>>>>>>>>       After I recompiled netlet and hardcoded 0 wait time in the
>>>>>>>>>       CircularBuffer.put() method, I still see the same difference
>>>>>>>>> even
>>>>>>>>>       when I increased operator memory to 10 GB and set "-D
>>>>>>>>>       dt.application.*.operator.*.attr.SPIN_MILLIS=0 -D
>>>>>>>>>       dt.application.*.operator.*.attr.QUEUE_CAPACITY=1024000".
>>>>>>>>> CPU %
>>>>>>>>>       is close to 100% both for thread and container local locality
>>>>>>>>>       settings. Note that in thread local two operators share 100%
>>>>>>>>> CPU,
>>>>>>>>>       while in container local each gets its own 100% load. It
>>>>>>>>> sounds
>>>>>>>>>       that container local will outperform thread local only when
>>>>>>>>>       number of emitted tuples is (relatively) low, for example
>>>>>>>>> when
>>>>>>>>> it
>>>>>>>>>       is CPU costly to produce tuples (hash computations,
>>>>>>>>>       compression/decompression, aggregations, filtering with
>>>>>>>>> complex
>>>>>>>>>       expressions). In cases where operator may emit 5 or more
>>>>>>>>> million
>>>>>>>>>       tuples per second, thread local may outperform container
>>>>>>>>> local
>>>>>>>>>       even when both operators are CPU intensive.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>       Thank you,
>>>>>>>>>
>>>>>>>>>       Vlad
>>>>>>>>>
>>>>>>>>>       On 9/26/15 22:52, Timothy Farkas wrote:
>>>>>>>>>
>>>>>>>>>       Hi Vlad,
>>>>>>>>>
>>>>>>>>>>       I just took a look at the CircularBuffer. Why are threads
>>>>>>>>>> polling
>>>>>>>>>> the state
>>>>>>>>>>       of the buffer before doing operations? Couldn't polling be
>>>>>>>>>>
>>>>>>>>>> avoided
>>>>>>>>>
>>>>>>>> entirely
>>>>>>>
>>>>>>>>       by using something like Condition variables to signal when the
>>>>>>>>>> buffer is
>>>>>>>>>>       ready for an operation to be performed?
>>>>>>>>>>
>>>>>>>>>>       Tim
>>>>>>>>>>
>>>>>>>>>>       On Sat, Sep 26, 2015 at 10:42 PM, Vlad Rozov<
>>>>>>>>>> [email protected]> <mailto:[email protected]>
>>>>>>>>>>       wrote:
>>>>>>>>>>
>>>>>>>>>>       After looking at few stack traces I think that in the
>>>>>>>>>> benchmark
>>>>>>>>>>
>>>>>>>>>>       application operators compete for the circular buffer that
>>>>>>>>>>>
>>>>>>>>>>> passes
>>>>>>>>>>
>>>>>>>>> slices
>>>>>>>
>>>>>>>>       from the emitter output to the consumer input and sleeps that
>>>>>>>>>>> avoid busy
>>>>>>>>>>>       wait are too long for the benchmark operators. I don't see
>>>>>>>>>>> the
>>>>>>>>>>> stack
>>>>>>>>>>>       similar to the one below all the time I take the threads
>>>>>>>>>>> dump,
>>>>>>>>>>>
>>>>>>>>>>> but
>>>>>>>>>>
>>>>>>>>> still
>>>>>>>
>>>>>>>>       quite often to suspect that sleep is the root cause. I'll
>>>>>>>>>>> recompile with
>>>>>>>>>>>       smaller sleep time and see how this will affect
>>>>>>>>>>> performance.
>>>>>>>>>>>
>>>>>>>>>>>       ----
>>>>>>>>>>>       "1/wordGenerator:RandomWordInputModule" prio=10
>>>>>>>>>>> tid=0x00007f78c8b8c000
>>>>>>>>>>>       nid=0x780f waiting on condition [0x00007f78abb17000]
>>>>>>>>>>>           java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>>>>>>>>>            at java.lang.Thread.sleep(Native Method)
>>>>>>>>>>>            at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> com.datatorrent.netlet.util.CircularBuffer.put(CircularBuffer.java:182)
>>>>>>
>>>>>>            at
>>>>>>>
>>>>>>>> com.datatorrent.stram.stream.InlineStream.put(InlineStream.java:79)
>>>>>>>>>>>            at
>>>>>>>>>>> com.datatorrent.stram.stream.MuxStream.put(MuxStream.java:117)
>>>>>>>>>>>            at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> com.datatorrent.api.DefaultOutputPort.emit(DefaultOutputPort.java:48)
>>>>>>
>>>>>>            at
>>>>>>>
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> com.datatorrent.benchmark.RandomWordInputModule.emitTuples(RandomWordInputModule.java:108)
>>>>>>
>>>>>>            at
>>>>>>>
>>>>>>>> com.datatorrent.stram.engine.InputNode.run(InputNode.java:115)
>>>>>>>>>>>            at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1377)
>>>>>>
>>>>>>       "2/counter:WordCountOperator" prio=10 tid=0x00007f78c8c98800
>>>>>>>
>>>>>>>> nid=0x780d
>>>>>>>>>>>       waiting on condition [0x00007f78abc18000]
>>>>>>>>>>>           java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>>>>>>>>>            at java.lang.Thread.sleep(Native Method)
>>>>>>>>>>>            at
>>>>>>>>>>>
>>>>>>>>>>> com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:519)
>>>>>>>>>>>            at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1377)
>>>>>>
>>>>>>       ----
>>>>>>>
>>>>>>>>
>>>>>>>>>>>       On 9/26/15 20:59, Amol Kekre wrote:
>>>>>>>>>>>
>>>>>>>>>>>       A good read -
>>>>>>>>>>>
>>>>>>>>>>> http://preshing.com/20111118/locks-arent-slow-lock-contention-is/
>>>>>>>>>>
>>>>>>>>>       Though it does not explain order of magnitude difference.
>>>>>>>
>>>>>>>>       Amol
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>       On Sat, Sep 26, 2015 at 4:25 PM, Vlad Rozov<
>>>>>>>>>>>> [email protected]> <mailto:[email protected]>
>>>>>>>>>>>>       wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>       In the benchmark test THREAD_LOCAL outperforms
>>>>>>>>>>>> CONTAINER_LOCAL
>>>>>>>>>>>>
>>>>>>>>>>>> by
>>>>>>>>>>>
>>>>>>>>>> an order
>>>>>>>
>>>>>>>>       of magnitude and both operators compete for CPU. I'll take a
>>>>>>>>>>>>
>>>>>>>>>>>>> closer look
>>>>>>>>>>>>>       why.
>>>>>>>>>>>>>
>>>>>>>>>>>>>       Thank you,
>>>>>>>>>>>>>
>>>>>>>>>>>>>       Vlad
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>       On 9/26/15 14:52, Thomas Weise wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>       THREAD_LOCAL - operators share thread
>>>>>>>>>>>>>
>>>>>>>>>>>>>       CONTAINER_LOCAL - each operator has its own thread
>>>>>>>>>>>>>
>>>>>>>>>>>>>>       So as long as operators utilize the CPU sufficiently
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (compete),
>>>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>
>>>>>>>>       latter
>>>>>>>>>>>>>>       will perform better.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       There will be cases where a single thread can
>>>>>>>>>>>>>> accommodate
>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>       operators. For example, a socket reader (mostly waiting
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> IO)
>>>>>>>>>>>>>
>>>>>>>>>>>> and a
>>>>>>>
>>>>>>>>       decompress (CPU hungry) can share a thread.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       But to get back to the original question, stream
>>>>>>>>>>>>>> locality
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> does
>>>>>>>>>>>>>
>>>>>>>>>>>> generally
>>>>>>>
>>>>>>>>       not reduce the total memory requirement. If you add
>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>> operators
>>>>>>>>>>>>>>       into
>>>>>>>>>>>>>>       one container, that container will also require more
>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>>>>>>>>>>> that's
>>>>>>>
>>>>>>>>       how
>>>>>>>>>>>>>>       the container size is calculated in the physical plan.
>>>>>>>>>>>>>> You
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> may
>>>>>>>>>>>>>
>>>>>>>>>>>> get some
>>>>>>>
>>>>>>>>       extra mileage when multiple operators share the same heap
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>> the need
>>>>>>>>>>>>>>       to
>>>>>>>>>>>>>>       identify the memory requirement per operator does not go
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> away.
>>>>>>>>>>>>>
>>>>>>>>>>>>       Thomas
>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>       On Sat, Sep 26, 2015 at 12:41 PM, Munagala Ramanath <
>>>>>>>>>>>>>>       [email protected] <mailto:[email protected]>>
>>>>>>>>>>>>>>       wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       Would CONTAINER_LOCAL achieve the same thing and
>>>>>>>>>>>>>> perform a
>>>>>>>>>>>>>> little better
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       on
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       a multi-core box ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       Ram
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       On Sat, Sep 26, 2015 at 12:18 PM, Sandeep Deshmukh <
>>>>>>>>>>>>>>>       [email protected] <mailto:
>>>>>>>>>>>>>>> [email protected]>>
>>>>>>>>>>>>>>>       wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       Yes, with this approach only two containers are
>>>>>>>>>>>>>>> required:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>
>>>>>>>>>>>>> for stram
>>>>>>>
>>>>>>>>       and
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       another for all operators. You can easily fit around 10
>>>>>>>>>>>>>>> operators in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       less
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       than 1GB.
>>>>>>>>>>>>>>>>       On 27 Sep 2015 00:32, "Timothy Farkas"<
>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> <mailto:[email protected]>  wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       Hi Ram,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       You could make all the operators thread local. This
>>>>>>>>>>>>>>>> cuts
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> down
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> on the
>>>>>>>
>>>>>>>>       overhead of separate containers and maximizes the memory
>>>>>>>>>>>>>>>>> available to
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       each
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       operator.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       Tim
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       On Sat, Sep 26, 2015 at 10:07 AM, Munagala Ramanath <
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       [email protected] <mailto:[email protected]>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>           Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       I was running into memory issues when deploying my
>>>>>>>>>>>>>>>>> app
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       sandbox
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       where all the operators were stuck forever in the
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> PENDING
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> state
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       because
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       they were being continually aborted and restarted
>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> the
>>>>>>>
>>>>>>>>       limited
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       memory on the sandbox. After some experimentation, I
>>>>>>>>>>>>>>>>> found
>>>>>>>>>>>>>>>>> that the
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       following config values seem to work:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       ------------------------------------------
>>>>>>>>>>>>>>>>>>       <
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>> https://datatorrent.slack.com/archives/engineering/p1443263607000010
>>>>>>
>>>>>>       *<property>    <name>dt.attr.MASTER_MEMORY_MB</name>
>>>>>>>
>>>>>>>>       <value>500</value>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>           </property>  <property>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <name>dt.application..operator.*
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       *.attr.MEMORY_MB</name>    <value>200</value>
>>>>>>>>>>>>>>>>> </property>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       <property>
>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>> <name>dt.application.TopNWordsWithQueries.operator.fileWordCount.attr.MEMORY_MB</name>
>>>>>>
>>>>>>             <value>512</value>  </property>*
>>>>>>>
>>>>>>>>       ------------------------------------------------
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       Are these reasonable values ? Is there a more
>>>>>>>>>>>>>>>>> systematic
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> way of
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       coming
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       up
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       with these values than trial-and-error ? Most of my
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> operators
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- with
>>>>>>>
>>>>>>>>       the
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       exception of fileWordCount -- need very little
>>>>>>>>>>>>>>>>> memory;
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> there a way
>>>>>>>>>>>>>>>>>       to
>>>>>>>>>>>>>>>>>       cut all values down to the bare minimum and maximize
>>>>>>>>>>>>>>>>> available memory
>>>>>>>>>>>>>>>>>       for
>>>>>>>>>>>>>>>>>       this one operator ?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       Thanks.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       Ram
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>

Re: Thread and Container locality

Reply via email to