[ 
https://issues.apache.org/jira/browse/HIVE-17174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-17174:
------------------------------------
       Resolution: Fixed
     Hadoop Flags: Reviewed
    Fix Version/s: 3.0.0
           Status: Resolved  (was: Patch Available)

Thanks [~gopalv]. Committed to master.

> LLAP: ShuffleHandler: optimize fadvise calls for broadcast edge
> ---------------------------------------------------------------
>
>                 Key: HIVE-17174
>                 URL: https://issues.apache.org/jira/browse/HIVE-17174
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>             Fix For: 3.0.0
>
>         Attachments: HIVE-17174.1.patch, HIVE-17174.2.patch
>
>
> Currently, once the data is transferred `fadvise` call is invoked to throw 
> away the pages. This may not be very helpful in broadcast, as it would tend 
> to transfer the same data to multiple downstream tasks. 
> e.g Q50 at 1 TB scale
> {noformat}
>       Edges:
>         Map 1 <- Map 5 (BROADCAST_EDGE)
>         Map 6 <- Reducer 2 (BROADCAST_EDGE), Reducer 3 (BROADCAST_EDGE), 
> Reducer 4 (BROADCAST_EDGE)
>         Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
>         Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE)
>         Reducer 4 <- Map 1 (CUSTOM_SIMPLE_EDGE)
>         Reducer 7 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 10 (BROADCAST_EDGE), Map 
> 11 (BROADCAST_EDGE), Map 6 (CUSTOM_SIMPLE_EDGE)
>         Reducer 8 <- Reducer 7 (SIMPLE_EDGE)
>         Reducer 9 <- Reducer 8 (SIMPLE_EDGE)
> Status: Running (Executing on YARN cluster with App id 
> application_1490656001509_6084)
> ----------------------------------------------------------------------------------------------
>         VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> ----------------------------------------------------------------------------------------------
> Map 5 ..........      llap     SUCCEEDED      1          1        0        0  
>      0       0
> Map 1 ..........      llap     SUCCEEDED     11         11        0        0  
>      0       0
> Reducer 4 ......      llap     SUCCEEDED      1          1        0        0  
>      0       0
> Reducer 2 ......      llap     SUCCEEDED      1          1        0        0  
>      0       0
> Reducer 3 ......      llap     SUCCEEDED      1          1        0        0  
>      0       0
> Map 6 ..........      llap     SUCCEEDED    139        139        0        0  
>      0       0
> Map 10 .........      llap     SUCCEEDED      1          1        0        0  
>      0       0
> Map 11 .........      llap     SUCCEEDED      1          1        0        0  
>      0       0
> Reducer 7 ......      llap     SUCCEEDED    834        834        0        0  
>      0       0
> Reducer 8 ......      llap     SUCCEEDED     24         24        0        0  
>      0       0
> Reducer 9 ......      llap     SUCCEEDED      1          1        0        0  
>      0       0
> ----------------------------------------------------------------------------------------------
> e.g count of evictions on files
> 139 
> /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_6084/1/output/attempt_1490656001509_6084_1_05_000000_0_18387/file.out
> 834 
> /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_6084/1/output/attempt_1490656001509_6084_1_07_000000_0_18420_1/file.out
> 834 
> /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_6084/1/output/attempt_1490656001509_6084_1_07_000000_0_18420_2/file.out
>    
> {noformat}
> It would be good to fadvise for cases when "partition != 0". This would help 
> retaining the pages for broadcast.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to