[
https://issues.apache.org/jira/browse/HIVE-19668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542792#comment-16542792
]
Hive QA commented on HIVE-19668:
--------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 10s{color}
| {color:red}
/data/hiveptest/logs/PreCommit-HIVE-Build-12579/patches/PreCommit-HIVE-Build-12579.patch
does not apply to master. Rebase required? Wrong Branch? See
http://cwiki.apache.org/confluence/display/Hive/HowToContribute for help.
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Console output |
http://104.198.109.242/logs//PreCommit-HIVE-Build-12579/yetus.txt |
| Powered by | Apache Yetus http://yetus.apache.org |
This message was automatically generated.
> Over 30% of the heap wasted by duplicate org.antlr.runtime.CommonToken's and
> duplicate strings
> ----------------------------------------------------------------------------------------------
>
> Key: HIVE-19668
> URL: https://issues.apache.org/jira/browse/HIVE-19668
> Project: Hive
> Issue Type: Improvement
> Components: HiveServer2
> Affects Versions: 3.0.0
> Reporter: Misha Dmitriev
> Assignee: Misha Dmitriev
> Priority: Major
> Attachments: HIVE-19668.01.patch, HIVE-19668.02.patch,
> HIVE-19668.03.patch, image-2018-05-22-17-41-39-572.png
>
>
> I've recently analyzed a HS2 heap dump, obtained when there was a huge memory
> spike during compilation of some big query. The analysis was done with jxray
> ([www.jxray.com).|http://www.jxray.com)./] It turns out that more than 90% of
> the 20G heap was used by data structures associated with query parsing
> ({{org.apache.hadoop.hive.ql.parse.QBExpr}}). There are probably multiple
> opportunities for optimizations here. One of them is to stop the code from
> creating duplicate instances of {{org.antlr.runtime.CommonToken}} class. See
> a sample of these objects in the attached image:
> !image-2018-05-22-17-41-39-572.png|width=879,height=399!
> Looks like these particular {{CommonToken}} objects are constants, that don't
> change once created. I see some code, e.g. in
> {{org.apache.hadoop.hive.ql.parse.CalcitePlanner}}, where such objects are
> apparently repeatedly created with e.g. {{new
> CommonToken(HiveParser.TOK_INSERT, "TOK_INSERT")}} If these 33 token kinds
> are instead created once and reused, we will save more than 1/10th of the
> heap in this scenario. Plus, since these objects are small but very numerous,
> getting rid of them will remove a gread deal of pressure from the GC.
> Another source of waste are duplicate strings, that collectively waste 26.1%
> of memory. Some of them come from CommonToken objects that have the same text
> (i.e. for multiple CommonToken objects the contents of their 'text' Strings
> are the same, but each has its own copy of that String). Other duplicate
> strings come from other sources, that are easy enough to fix by adding
> String.intern() calls.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)