[jira] [Updated] (TEZ-3680) Optimizations to UnorderedPartitionedKVWriter

Rajesh Balamohan (JIRA) Thu, 06 Apr 2017 23:30:07 -0700

     [ 
https://issues.apache.org/jira/browse/TEZ-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rajesh Balamohan updated TEZ-3680:
----------------------------------
    Attachment: TEZ-3680.4.patch

SpillCallable does not notify per partition - is this required? (during the 
last spill?)
- This is not needed, as it notifies immediately after scheduling.  Even if it 
does not spill, it ends up notifying once during close.

newCachedPool - a cached pool should be ok here, correct? since we don't expect 
too many buffers to be used. (Was thinking ThreadPool with min/max in the last 
comment, but i think cachedPool will work in this case)
- cached thread pool has unbounded threads. But this is in the scenario where 
users are not going to creates 1000s of buffers. But replaced with 
"ThreadPoolExecutor" with min/max which is used by cachedThreadPool internally.

In terms of the notify after N records, don't think buffer size should affect 
that. In fact buffer size being large, with small records, can lead to 
infrequent notifications, which can cause timeouts. Afaik. PIG relies on these 
notifications for timeouts.
+1, after addressing these comments.
- Added notification for every 1000 records.

> Optimizations to UnorderedPartitionedKVWriter
> ---------------------------------------------
>
>                 Key: TEZ-3680
>                 URL: https://issues.apache.org/jira/browse/TEZ-3680
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: profiler.png, TEZ-3680.1.patch, TEZ-3680.2.patch, 
> TEZ-3680.3.patch, TEZ-3680.4.patch
>
>
> 1. Consider increasing the number of threads in spill executor. 
> {{TEZ_RUNTIME_UNORDERED_OUTPUT_MAX_PER_BUFFER_SIZE_BYTES}} can be used to 
> configure the buffer size. If smaller buffer sizes are provided, there is a 
> chance of getting frequent spills; currently the spill executor operates in 
> single threaded mode.
> 2. During profiling, things like incrementing the counters, notifying 
> progress came up. This may not be common in regular tez jobs. But in 
> processes like LLAP (hive based), it is possible to get into such situations. 
> I will attach the profiler snapshot showing this. It would be good to 
> update/notify less frequently.
> 3. Optimize mergeAll().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (TEZ-3680) Optimizations to UnorderedPartitionedKVWriter

Reply via email to