[
https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635782#comment-13635782
]
Gunther Hagleitner commented on HIVE-4103:
------------------------------------------
I took some time to test out the two versions of the code. I ran a number of
mapjoins ranging from small to at the limit and finally over the limit. In
summary: Without the gc calls we overestimate the used memory very slightly.
The biggest one I've seen is ~1%. The errors btw always cause the estimates to
be more conservative, never less. The performance benefit on the other hand is
quite substantial: On that large run it went from 120s to 56s with Gopals
patch. I think we should move forward with this.
Largest run:
With Patch:
{noformat}
2013-04-18 05:29:36 Starting to launch local task to process map join;
maximum memory = 1065484288
2013-04-18 05:29:42 Processing rows: 200000 Hashtable size: 199999
Memory usage: 108807528 rate: 0.102
2013-04-18 05:29:44 Processing rows: 300000 Hashtable size: 299999
Memory usage: 158575416 rate: 0.149
2013-04-18 05:29:46 Processing rows: 400000 Hashtable size: 399999
Memory usage: 211033848 rate: 0.198
2013-04-18 05:29:48 Processing rows: 500000 Hashtable size: 499999
Memory usage: 260673400 rate: 0.245
2013-04-18 05:29:50 Processing rows: 600000 Hashtable size: 599999
Memory usage: 310156256 rate: 0.291
2013-04-18 05:29:53 Processing rows: 700000 Hashtable size: 699999
Memory usage: 359750536 rate: 0.338
2013-04-18 05:29:54 Processing rows: 800000 Hashtable size: 799999
Memory usage: 417989768 rate: 0.392
2013-04-18 05:29:57 Processing rows: 900000 Hashtable size: 899999
Memory usage: 460568536 rate: 0.432
2013-04-18 05:29:58 Processing rows: 1000000 Hashtable size: 999999
Memory usage: 510475320 rate: 0.479
2013-04-18 05:30:01 Processing rows: 1100000 Hashtable size: 1099999
Memory usage: 559513584 rate: 0.525
2013-04-18 05:30:03 Processing rows: 1200000 Hashtable size: 1199999
Memory usage: 609277088 rate: 0.572
2013-04-18 05:30:06 Processing rows: 1300000 Hashtable size: 1299999
Memory usage: 659366968 rate: 0.619
2013-04-18 05:30:07 Processing rows: 1400000 Hashtable size: 1399999
Memory usage: 708744832 rate: 0.665
2013-04-18 05:30:08 Processing rows: 1500000 Hashtable size: 1499999
Memory usage: 758335688 rate: 0.712
2013-04-18 05:30:13 Processing rows: 1600000 Hashtable size: 1599999
Memory usage: 825625224 rate: 0.775
2013-04-18 05:30:14 Processing rows: 1646400 Hashtable size: 1646400
Memory usage: 848652056 rate: 0.796
2013-04-18 05:30:14 Dump the hashtable into file:
file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable
2013-04-18 05:30:32 Upload 1 File to:
file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable
File size: 127593266
2013-04-18 05:30:32 End of local task; Time Taken: 56.264 sec.
{noformat}
Without patch:
{noformat}
2013-04-18 05:55:22 Starting to launch local task to process map join;
maximum memory = 1065484288
2013-04-18 05:55:29 Processing rows: 200000 Hashtable size: 199999
Memory usage: 108779608 rate: 0.102
2013-04-18 05:55:33 Processing rows: 300000 Hashtable size: 299999
Memory usage: 157203744 rate: 0.148
2013-04-18 05:55:37 Processing rows: 400000 Hashtable size: 399999
Memory usage: 208667552 rate: 0.196
2013-04-18 05:55:42 Processing rows: 500000 Hashtable size: 499999
Memory usage: 258126352 rate: 0.242
2013-04-18 05:55:46 Processing rows: 600000 Hashtable size: 599999
Memory usage: 307734104 rate: 0.289
2013-04-18 05:55:51 Processing rows: 700000 Hashtable size: 699999
Memory usage: 357043768 rate: 0.335
2013-04-18 05:55:57 Processing rows: 800000 Hashtable size: 799999
Memory usage: 415059928 rate: 0.39
2013-04-18 05:56:04 Processing rows: 900000 Hashtable size: 899999
Memory usage: 460135344 rate: 0.432
2013-04-18 05:56:10 Processing rows: 1000000 Hashtable size: 999999
Memory usage: 509690176 rate: 0.478
2013-04-18 05:56:18 Processing rows: 1100000 Hashtable size: 1099999
Memory usage: 559042448 rate: 0.525
2013-04-18 05:56:25 Processing rows: 1200000 Hashtable size: 1199999
Memory usage: 608652728 rate: 0.571
2013-04-18 05:56:33 Processing rows: 1300000 Hashtable size: 1299999
Memory usage: 657959872 rate: 0.618
2013-04-18 05:56:42 Processing rows: 1400000 Hashtable size: 1399999
Memory usage: 707441328 rate: 0.664
2013-04-18 05:56:51 Processing rows: 1500000 Hashtable size: 1499999
Memory usage: 756877400 rate: 0.71
2013-04-18 05:57:01 Processing rows: 1600000 Hashtable size: 1599999
Memory usage: 823254568 rate: 0.773
2013-04-18 05:57:12 Processing rows: 1646400 Hashtable size: 1646400
Memory usage: 837505640 rate: 0.786
2013-04-18 05:57:12 Dump the hashtable into file:
file:/tmp/hrt_qa/hive_2013-04-18_17-55-16_725_2217141373925203269/-local-10002/HashTable-Stage-1/MapJoin-cd-01--.hashtable
2013-04-18 05:57:22 Upload 1 File to:
file:/tmp/hrt_qa/hive_2013-04-18_17-55-16_725_2217141373925203269/-local-10002/HashTable-Stage-1/MapJoin-cd-01--.hashtable
File size: 127593266
2013-04-18 05:57:22 End of local task; Time Taken: 120.19 sec.
{noformat}
Small run:
without patch:
{noformat}
2013-04-18 05:08:06 Starting to launch local task to process map join;
maximum memory = 1065484288
2013-04-18 05:08:15 Processing rows: 200000 Hashtable size: 199999
Memory usage: 109063800 rate: 0.102
2013-04-18 05:08:18 Processing rows: 274400 Hashtable size: 274400
Memory usage: 144697408 rate: 0.136
2013-04-18 05:08:18 Dump the hashtable into file:
file:/tmp/hrt_qa/hive_2013-04-18_17-08-02_486_1659630314838477132/-local-10002/HashTable-Stage-1/MapJoin-cd-51--.hashtable
2013-04-18 05:08:21 Upload 1 File to:
file:/tmp/hrt_qa/hive_2013-04-18_17-08-02_486_1659630314838477132/-local-10002/HashTable-Stage-1/MapJoin-cd-51--.hashtable
File size: 21263116
2013-04-18 05:08:21 End of local task; Time Taken: 14.35 sec.
{noformat}
with fix:
{noformat}
2013-04-18 05:11:45 Starting to launch local task to process map join;
maximum memory = 1065484288
2013-04-18 05:11:52 Processing rows: 200000 Hashtable size: 199999
Memory usage: 109177944 rate: 0.102
2013-04-18 05:11:54 Processing rows: 274400 Hashtable size: 274400
Memory usage: 146485584 rate: 0.137
2013-04-18 05:11:54 Dump the hashtable into file:
file:/tmp/hrt_qa/hive_2013-04-18_17-11-39_065_4048995808819270025/-local-10002/HashTable-Stage-1/MapJoin-cd-01--.hashtable
2013-04-18 05:11:57 Upload 1 File to:
file:/tmp/hrt_qa/hive_2013-04-18_17-11-39_065_4048995808819270025/-local-10002/HashTable-Stage-1/MapJoin-cd-01--.hashtable
File size: 21263116
2013-04-18 05:11:57 End of local task; Time Taken: 12.226 sec.
{noformat}
> Remove System.gc() call from the map-join local-task loop
> ---------------------------------------------------------
>
> Key: HIVE-4103
> URL: https://issues.apache.org/jira/browse/HIVE-4103
> Project: Hive
> Issue Type: Bug
> Reporter: Gopal V
> Assignee: Gopal V
> Priority: Minor
> Attachments: HIVE-4103.patch
>
>
> Hive's HashMapWrapper calls System.gc() twice within the
> HashMapWrapper::isAbort() which produces a significant slow-down during the
> loop.
> {code}
> 2013-03-01 04:54:28 The gc calls took 677 ms
> 2013-03-01 04:54:28 Processing rows: 200000 Hashtable size:
> 199999 Memory usage: 62955432 rate: 0.033
> 2013-03-01 04:54:31 The gc calls took 956 ms
> 2013-03-01 04:54:31 Processing rows: 300000 Hashtable size:
> 299999 Memory usage: 90826656 rate: 0.048
> 2013-03-01 04:54:33 The gc calls took 967 ms
> 2013-03-01 04:54:33 Processing rows: 384160 Hashtable size:
> 384160 Memory usage: 114412712 rate: 0.06
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira