[
https://issues.apache.org/jira/browse/HIVE-24688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272812#comment-17272812
]
László Bodor commented on HIVE-24688:
-------------------------------------
{code:java}
public Object copyObject(Object o) {
...
if (o instanceof Text) {
String str = ((Text)o).toString();
HiveVarcharWritable hcw = new HiveVarcharWritable();
hcw.set(str, ((VarcharTypeInfo)typeInfo).getLength());
return hcw;
}
{code}
here we end up decoding a Text to String (toString()) and encoding back
(hcw.set) just because we want to force a max length...I guess there is a
better way, e.g. if the Text is already truncated to a proper length, we can
simply byte-copy its value...however, according to the flamegraph, the
allocation is mostly about the byte[], so byte copy is also a problem, but I'm
afraid we should copy as we reuse the writable in RowContainer loop:
{code}
this.currentReadBlock[i++] = (ROW)
ObjectInspectorUtils.copyToStandardObject(serde.deserialize(val),
serde.getObjectInspector(), ObjectInspectorCopyOption.WRITABLE, false);
{code}
> Optimise ObjectInspectorUtils.copyToStandardObject
> --------------------------------------------------
>
> Key: HIVE-24688
> URL: https://issues.apache.org/jira/browse/HIVE-24688
> Project: Hive
> Issue Type: Improvement
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
> Attachments: Screen Shot 2021-01-27 at 9.52.32 AM.png
>
>
> It's not necessarily copyToStandardObject which should be optimized, but we
> need to consider some optimization on the attached codepath.
> In a customer case, 3 reducer tasks run forever (handling skewed keys) and
> most of the time is spent on this code path, utilizing GC heavily. At the
> moment I'm open to any kind of optimization:
> 1. do we need to copy Text? cannot we get a reference back?
> !Screen Shot 2021-01-27 at 9.52.32 AM.png|width=652,height=280!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)