[
https://issues.apache.org/jira/browse/HIVE-17174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104541#comment-16104541
]
Gopal V commented on HIVE-17174:
--------------------------------
LGTM - +1.
Minor nit on {{llap.shuffle.os.cache.optimize.evict}} - change to
"always.evict" and flip the values, so that it is easy to document.
> LLAP: ShuffleHandler: optimize fadvise calls for broadcast edge
> ---------------------------------------------------------------
>
> Key: HIVE-17174
> URL: https://issues.apache.org/jira/browse/HIVE-17174
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Priority: Minor
> Attachments: HIVE-17174.1.patch
>
>
> Currently, once the data is transferred `fadvise` call is invoked to throw
> away the pages. This may not be very helpful in broadcast, as it would tend
> to transfer the same data to multiple downstream tasks.
> e.g Q50 at 1 TB scale
> {noformat}
> Edges:
> Map 1 <- Map 5 (BROADCAST_EDGE)
> Map 6 <- Reducer 2 (BROADCAST_EDGE), Reducer 3 (BROADCAST_EDGE),
> Reducer 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 4 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 7 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 10 (BROADCAST_EDGE), Map
> 11 (BROADCAST_EDGE), Map 6 (CUSTOM_SIMPLE_EDGE)
> Reducer 8 <- Reducer 7 (SIMPLE_EDGE)
> Reducer 9 <- Reducer 8 (SIMPLE_EDGE)
> Status: Running (Executing on YARN cluster with App id
> application_1490656001509_6084)
> ----------------------------------------------------------------------------------------------
> VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING
> FAILED KILLED
> ----------------------------------------------------------------------------------------------
> Map 5 .......... llap SUCCEEDED 1 1 0 0
> 0 0
> Map 1 .......... llap SUCCEEDED 11 11 0 0
> 0 0
> Reducer 4 ...... llap SUCCEEDED 1 1 0 0
> 0 0
> Reducer 2 ...... llap SUCCEEDED 1 1 0 0
> 0 0
> Reducer 3 ...... llap SUCCEEDED 1 1 0 0
> 0 0
> Map 6 .......... llap SUCCEEDED 139 139 0 0
> 0 0
> Map 10 ......... llap SUCCEEDED 1 1 0 0
> 0 0
> Map 11 ......... llap SUCCEEDED 1 1 0 0
> 0 0
> Reducer 7 ...... llap SUCCEEDED 834 834 0 0
> 0 0
> Reducer 8 ...... llap SUCCEEDED 24 24 0 0
> 0 0
> Reducer 9 ...... llap SUCCEEDED 1 1 0 0
> 0 0
> ----------------------------------------------------------------------------------------------
> e.g count of evictions on files
> 139
> /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_6084/1/output/attempt_1490656001509_6084_1_05_000000_0_18387/file.out
> 834
> /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_6084/1/output/attempt_1490656001509_6084_1_07_000000_0_18420_1/file.out
> 834
> /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_6084/1/output/attempt_1490656001509_6084_1_07_000000_0_18420_2/file.out
>
> {noformat}
> It would be good to fadvise for cases when "partition != 0". This would help
> retaining the pages for broadcast.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)