[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635782#comment-13635782 ]
Gunther Hagleitner commented on HIVE-4103: ------------------------------------------ I took some time to test out the two versions of the code. I ran a number of mapjoins ranging from small to at the limit and finally over the limit. In summary: Without the gc calls we overestimate the used memory very slightly. The biggest one I've seen is ~1%. The errors btw always cause the estimates to be more conservative, never less. The performance benefit on the other hand is quite substantial: On that large run it went from 120s to 56s with Gopals patch. I think we should move forward with this. Largest run: With Patch: {noformat} 2013-04-18 05:29:36 Starting to launch local task to process map join; maximum memory = 1065484288 2013-04-18 05:29:42 Processing rows: 200000 Hashtable size: 199999 Memory usage: 108807528 rate: 0.102 2013-04-18 05:29:44 Processing rows: 300000 Hashtable size: 299999 Memory usage: 158575416 rate: 0.149 2013-04-18 05:29:46 Processing rows: 400000 Hashtable size: 399999 Memory usage: 211033848 rate: 0.198 2013-04-18 05:29:48 Processing rows: 500000 Hashtable size: 499999 Memory usage: 260673400 rate: 0.245 2013-04-18 05:29:50 Processing rows: 600000 Hashtable size: 599999 Memory usage: 310156256 rate: 0.291 2013-04-18 05:29:53 Processing rows: 700000 Hashtable size: 699999 Memory usage: 359750536 rate: 0.338 2013-04-18 05:29:54 Processing rows: 800000 Hashtable size: 799999 Memory usage: 417989768 rate: 0.392 2013-04-18 05:29:57 Processing rows: 900000 Hashtable size: 899999 Memory usage: 460568536 rate: 0.432 2013-04-18 05:29:58 Processing rows: 1000000 Hashtable size: 999999 Memory usage: 510475320 rate: 0.479 2013-04-18 05:30:01 Processing rows: 1100000 Hashtable size: 1099999 Memory usage: 559513584 rate: 0.525 2013-04-18 05:30:03 Processing rows: 1200000 Hashtable size: 1199999 Memory usage: 609277088 rate: 0.572 2013-04-18 05:30:06 Processing rows: 1300000 Hashtable size: 1299999 Memory usage: 659366968 rate: 0.619 2013-04-18 05:30:07 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 708744832 rate: 0.665 2013-04-18 05:30:08 Processing rows: 1500000 Hashtable size: 1499999 Memory usage: 758335688 rate: 0.712 2013-04-18 05:30:13 Processing rows: 1600000 Hashtable size: 1599999 Memory usage: 825625224 rate: 0.775 2013-04-18 05:30:14 Processing rows: 1646400 Hashtable size: 1646400 Memory usage: 848652056 rate: 0.796 2013-04-18 05:30:14 Dump the hashtable into file: file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable 2013-04-18 05:30:32 Upload 1 File to: file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable File size: 127593266 2013-04-18 05:30:32 End of local task; Time Taken: 56.264 sec. {noformat} Without patch: {noformat} 2013-04-18 05:55:22 Starting to launch local task to process map join; maximum memory = 1065484288 2013-04-18 05:55:29 Processing rows: 200000 Hashtable size: 199999 Memory usage: 108779608 rate: 0.102 2013-04-18 05:55:33 Processing rows: 300000 Hashtable size: 299999 Memory usage: 157203744 rate: 0.148 2013-04-18 05:55:37 Processing rows: 400000 Hashtable size: 399999 Memory usage: 208667552 rate: 0.196 2013-04-18 05:55:42 Processing rows: 500000 Hashtable size: 499999 Memory usage: 258126352 rate: 0.242 2013-04-18 05:55:46 Processing rows: 600000 Hashtable size: 599999 Memory usage: 307734104 rate: 0.289 2013-04-18 05:55:51 Processing rows: 700000 Hashtable size: 699999 Memory usage: 357043768 rate: 0.335 2013-04-18 05:55:57 Processing rows: 800000 Hashtable size: 799999 Memory usage: 415059928 rate: 0.39 2013-04-18 05:56:04 Processing rows: 900000 Hashtable size: 899999 Memory usage: 460135344 rate: 0.432 2013-04-18 05:56:10 Processing rows: 1000000 Hashtable size: 999999 Memory usage: 509690176 rate: 0.478 2013-04-18 05:56:18 Processing rows: 1100000 Hashtable size: 1099999 Memory usage: 559042448 rate: 0.525 2013-04-18 05:56:25 Processing rows: 1200000 Hashtable size: 1199999 Memory usage: 608652728 rate: 0.571 2013-04-18 05:56:33 Processing rows: 1300000 Hashtable size: 1299999 Memory usage: 657959872 rate: 0.618 2013-04-18 05:56:42 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 707441328 rate: 0.664 2013-04-18 05:56:51 Processing rows: 1500000 Hashtable size: 1499999 Memory usage: 756877400 rate: 0.71 2013-04-18 05:57:01 Processing rows: 1600000 Hashtable size: 1599999 Memory usage: 823254568 rate: 0.773 2013-04-18 05:57:12 Processing rows: 1646400 Hashtable size: 1646400 Memory usage: 837505640 rate: 0.786 2013-04-18 05:57:12 Dump the hashtable into file: file:/tmp/hrt_qa/hive_2013-04-18_17-55-16_725_2217141373925203269/-local-10002/HashTable-Stage-1/MapJoin-cd-01--.hashtable 2013-04-18 05:57:22 Upload 1 File to: file:/tmp/hrt_qa/hive_2013-04-18_17-55-16_725_2217141373925203269/-local-10002/HashTable-Stage-1/MapJoin-cd-01--.hashtable File size: 127593266 2013-04-18 05:57:22 End of local task; Time Taken: 120.19 sec. {noformat} Small run: without patch: {noformat} 2013-04-18 05:08:06 Starting to launch local task to process map join; maximum memory = 1065484288 2013-04-18 05:08:15 Processing rows: 200000 Hashtable size: 199999 Memory usage: 109063800 rate: 0.102 2013-04-18 05:08:18 Processing rows: 274400 Hashtable size: 274400 Memory usage: 144697408 rate: 0.136 2013-04-18 05:08:18 Dump the hashtable into file: file:/tmp/hrt_qa/hive_2013-04-18_17-08-02_486_1659630314838477132/-local-10002/HashTable-Stage-1/MapJoin-cd-51--.hashtable 2013-04-18 05:08:21 Upload 1 File to: file:/tmp/hrt_qa/hive_2013-04-18_17-08-02_486_1659630314838477132/-local-10002/HashTable-Stage-1/MapJoin-cd-51--.hashtable File size: 21263116 2013-04-18 05:08:21 End of local task; Time Taken: 14.35 sec. {noformat} with fix: {noformat} 2013-04-18 05:11:45 Starting to launch local task to process map join; maximum memory = 1065484288 2013-04-18 05:11:52 Processing rows: 200000 Hashtable size: 199999 Memory usage: 109177944 rate: 0.102 2013-04-18 05:11:54 Processing rows: 274400 Hashtable size: 274400 Memory usage: 146485584 rate: 0.137 2013-04-18 05:11:54 Dump the hashtable into file: file:/tmp/hrt_qa/hive_2013-04-18_17-11-39_065_4048995808819270025/-local-10002/HashTable-Stage-1/MapJoin-cd-01--.hashtable 2013-04-18 05:11:57 Upload 1 File to: file:/tmp/hrt_qa/hive_2013-04-18_17-11-39_065_4048995808819270025/-local-10002/HashTable-Stage-1/MapJoin-cd-01--.hashtable File size: 21263116 2013-04-18 05:11:57 End of local task; Time Taken: 12.226 sec. {noformat} > Remove System.gc() call from the map-join local-task loop > --------------------------------------------------------- > > Key: HIVE-4103 > URL: https://issues.apache.org/jira/browse/HIVE-4103 > Project: Hive > Issue Type: Bug > Reporter: Gopal V > Assignee: Gopal V > Priority: Minor > Attachments: HIVE-4103.patch > > > Hive's HashMapWrapper calls System.gc() twice within the > HashMapWrapper::isAbort() which produces a significant slow-down during the > loop. > {code} > 2013-03-01 04:54:28 The gc calls took 677 ms > 2013-03-01 04:54:28 Processing rows: 200000 Hashtable size: > 199999 Memory usage: 62955432 rate: 0.033 > 2013-03-01 04:54:31 The gc calls took 956 ms > 2013-03-01 04:54:31 Processing rows: 300000 Hashtable size: > 299999 Memory usage: 90826656 rate: 0.048 > 2013-03-01 04:54:33 The gc calls took 967 ms > 2013-03-01 04:54:33 Processing rows: 384160 Hashtable size: > 384160 Memory usage: 114412712 rate: 0.06 > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira