[ 
https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635782#comment-13635782
 ] 

Gunther Hagleitner commented on HIVE-4103:
------------------------------------------

I took some time to test out the two versions of the code. I ran a number of 
mapjoins ranging from small to at the limit and finally over the limit. In 
summary: Without the gc calls we overestimate the used memory very slightly. 
The biggest one I've seen is ~1%. The errors btw always cause the estimates to 
be more conservative, never less. The performance benefit on the other hand is 
quite substantial: On that large run it went from 120s to 56s with Gopals 
patch. I think we should move forward with this.

Largest run:

With Patch:

{noformat}
2013-04-18 05:29:36     Starting to launch local task to process map join;      
maximum memory = 1065484288
2013-04-18 05:29:42     Processing rows:        200000  Hashtable size: 199999  
Memory usage:   108807528       rate:   0.102
2013-04-18 05:29:44     Processing rows:        300000  Hashtable size: 299999  
Memory usage:   158575416       rate:   0.149
2013-04-18 05:29:46     Processing rows:        400000  Hashtable size: 399999  
Memory usage:   211033848       rate:   0.198
2013-04-18 05:29:48     Processing rows:        500000  Hashtable size: 499999  
Memory usage:   260673400       rate:   0.245
2013-04-18 05:29:50     Processing rows:        600000  Hashtable size: 599999  
Memory usage:   310156256       rate:   0.291
2013-04-18 05:29:53     Processing rows:        700000  Hashtable size: 699999  
Memory usage:   359750536       rate:   0.338
2013-04-18 05:29:54     Processing rows:        800000  Hashtable size: 799999  
Memory usage:   417989768       rate:   0.392
2013-04-18 05:29:57     Processing rows:        900000  Hashtable size: 899999  
Memory usage:   460568536       rate:   0.432
2013-04-18 05:29:58     Processing rows:        1000000 Hashtable size: 999999  
Memory usage:   510475320       rate:   0.479
2013-04-18 05:30:01     Processing rows:        1100000 Hashtable size: 1099999 
Memory usage:   559513584       rate:   0.525
2013-04-18 05:30:03     Processing rows:        1200000 Hashtable size: 1199999 
Memory usage:   609277088       rate:   0.572
2013-04-18 05:30:06     Processing rows:        1300000 Hashtable size: 1299999 
Memory usage:   659366968       rate:   0.619
2013-04-18 05:30:07     Processing rows:        1400000 Hashtable size: 1399999 
Memory usage:   708744832       rate:   0.665
2013-04-18 05:30:08     Processing rows:        1500000 Hashtable size: 1499999 
Memory usage:   758335688       rate:   0.712
2013-04-18 05:30:13     Processing rows:        1600000 Hashtable size: 1599999 
Memory usage:   825625224       rate:   0.775
2013-04-18 05:30:14     Processing rows:        1646400 Hashtable size: 1646400 
Memory usage:   848652056       rate:   0.796
2013-04-18 05:30:14     Dump the hashtable into file: 
file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable
2013-04-18 05:30:32     Upload 1 File to: 
file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable
 File size: 127593266
2013-04-18 05:30:32     End of local task; Time Taken: 56.264 sec.
{noformat}

Without patch:

{noformat}
2013-04-18 05:55:22     Starting to launch local task to process map join;      
maximum memory = 1065484288
2013-04-18 05:55:29     Processing rows:        200000  Hashtable size: 199999  
Memory usage:   108779608       rate:   0.102
2013-04-18 05:55:33     Processing rows:        300000  Hashtable size: 299999  
Memory usage:   157203744       rate:   0.148
2013-04-18 05:55:37     Processing rows:        400000  Hashtable size: 399999  
Memory usage:   208667552       rate:   0.196
2013-04-18 05:55:42     Processing rows:        500000  Hashtable size: 499999  
Memory usage:   258126352       rate:   0.242
2013-04-18 05:55:46     Processing rows:        600000  Hashtable size: 599999  
Memory usage:   307734104       rate:   0.289
2013-04-18 05:55:51     Processing rows:        700000  Hashtable size: 699999  
Memory usage:   357043768       rate:   0.335
2013-04-18 05:55:57     Processing rows:        800000  Hashtable size: 799999  
Memory usage:   415059928       rate:   0.39
2013-04-18 05:56:04     Processing rows:        900000  Hashtable size: 899999  
Memory usage:   460135344       rate:   0.432
2013-04-18 05:56:10     Processing rows:        1000000 Hashtable size: 999999  
Memory usage:   509690176       rate:   0.478
2013-04-18 05:56:18     Processing rows:        1100000 Hashtable size: 1099999 
Memory usage:   559042448       rate:   0.525
2013-04-18 05:56:25     Processing rows:        1200000 Hashtable size: 1199999 
Memory usage:   608652728       rate:   0.571
2013-04-18 05:56:33     Processing rows:        1300000 Hashtable size: 1299999 
Memory usage:   657959872       rate:   0.618
2013-04-18 05:56:42     Processing rows:        1400000 Hashtable size: 1399999 
Memory usage:   707441328       rate:   0.664
2013-04-18 05:56:51     Processing rows:        1500000 Hashtable size: 1499999 
Memory usage:   756877400       rate:   0.71
2013-04-18 05:57:01     Processing rows:        1600000 Hashtable size: 1599999 
Memory usage:   823254568       rate:   0.773
2013-04-18 05:57:12     Processing rows:        1646400 Hashtable size: 1646400 
Memory usage:   837505640       rate:   0.786
2013-04-18 05:57:12     Dump the hashtable into file: 
file:/tmp/hrt_qa/hive_2013-04-18_17-55-16_725_2217141373925203269/-local-10002/HashTable-Stage-1/MapJoin-cd-01--.hashtable
2013-04-18 05:57:22     Upload 1 File to: 
file:/tmp/hrt_qa/hive_2013-04-18_17-55-16_725_2217141373925203269/-local-10002/HashTable-Stage-1/MapJoin-cd-01--.hashtable
 File size: 127593266
2013-04-18 05:57:22     End of local task; Time Taken: 120.19 sec.
{noformat}

Small run:

without patch:

{noformat}
2013-04-18 05:08:06     Starting to launch local task to process map join;      
maximum memory = 1065484288
2013-04-18 05:08:15     Processing rows:        200000  Hashtable size: 199999  
Memory usage:   109063800       rate:   0.102
2013-04-18 05:08:18     Processing rows:        274400  Hashtable size: 274400  
Memory usage:   144697408       rate:   0.136
2013-04-18 05:08:18     Dump the hashtable into file: 
file:/tmp/hrt_qa/hive_2013-04-18_17-08-02_486_1659630314838477132/-local-10002/HashTable-Stage-1/MapJoin-cd-51--.hashtable
2013-04-18 05:08:21     Upload 1 File to: 
file:/tmp/hrt_qa/hive_2013-04-18_17-08-02_486_1659630314838477132/-local-10002/HashTable-Stage-1/MapJoin-cd-51--.hashtable
 File size: 21263116
2013-04-18 05:08:21     End of local task; Time Taken: 14.35 sec.
{noformat}

with fix:

{noformat}
2013-04-18 05:11:45     Starting to launch local task to process map join;      
maximum memory = 1065484288
2013-04-18 05:11:52     Processing rows:        200000  Hashtable size: 199999  
Memory usage:   109177944       rate:   0.102
2013-04-18 05:11:54     Processing rows:        274400  Hashtable size: 274400  
Memory usage:   146485584       rate:   0.137
2013-04-18 05:11:54     Dump the hashtable into file: 
file:/tmp/hrt_qa/hive_2013-04-18_17-11-39_065_4048995808819270025/-local-10002/HashTable-Stage-1/MapJoin-cd-01--.hashtable
2013-04-18 05:11:57     Upload 1 File to: 
file:/tmp/hrt_qa/hive_2013-04-18_17-11-39_065_4048995808819270025/-local-10002/HashTable-Stage-1/MapJoin-cd-01--.hashtable
 File size: 21263116
2013-04-18 05:11:57     End of local task; Time Taken: 12.226 sec.
{noformat}

                
> Remove System.gc() call from the map-join local-task loop
> ---------------------------------------------------------
>
>                 Key: HIVE-4103
>                 URL: https://issues.apache.org/jira/browse/HIVE-4103
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Minor
>         Attachments: HIVE-4103.patch
>
>
> Hive's HashMapWrapper calls System.gc() twice within the 
> HashMapWrapper::isAbort() which produces a significant slow-down during the 
> loop.
> {code}
> 2013-03-01 04:54:28 The gc calls took 677 ms
> 2013-03-01 04:54:28     Processing rows:        200000  Hashtable size: 
> 199999  Memory usage:   62955432        rate:   0.033
> 2013-03-01 04:54:31 The gc calls took 956 ms
> 2013-03-01 04:54:31     Processing rows:        300000  Hashtable size: 
> 299999  Memory usage:   90826656        rate:   0.048
> 2013-03-01 04:54:33 The gc calls took 967 ms
> 2013-03-01 04:54:33     Processing rows:        384160  Hashtable size: 
> 384160  Memory usage:   114412712       rate:   0.06
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to