[ 
https://issues.apache.org/jira/browse/HIVE-24688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272812#comment-17272812
 ] 

László Bodor commented on HIVE-24688:
-------------------------------------

{code:java}
  public Object copyObject(Object o) {
...

    if (o instanceof Text) {

      String str = ((Text)o).toString();
      HiveVarcharWritable hcw = new HiveVarcharWritable();
      hcw.set(str, ((VarcharTypeInfo)typeInfo).getLength());
      return hcw;
    }
{code}

here we end up decoding a Text to String (toString()) and encoding back 
(hcw.set) just because we want to force a max length...I guess there is a 
better way, e.g. if the Text is already truncated to a proper length, we can 
simply byte-copy its value...however, according to the flamegraph, the 
allocation is mostly about the byte[], so byte copy is also a problem, but I'm 
afraid we should copy as we reuse the writable in RowContainer loop:
{code}
this.currentReadBlock[i++] = (ROW) 
ObjectInspectorUtils.copyToStandardObject(serde.deserialize(val), 
serde.getObjectInspector(), ObjectInspectorCopyOption.WRITABLE, false);
{code}

> Optimise ObjectInspectorUtils.copyToStandardObject
> --------------------------------------------------
>
>                 Key: HIVE-24688
>                 URL: https://issues.apache.org/jira/browse/HIVE-24688
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>         Attachments: Screen Shot 2021-01-27 at 9.52.32 AM.png
>
>
> It's not necessarily copyToStandardObject which should be optimized, but we 
> need to consider some optimization on the attached codepath.
> In a customer case, 3 reducer tasks run forever (handling skewed keys) and 
> most of the time is spent on this code path, utilizing GC heavily. At the 
> moment I'm open to any kind of optimization:
>  1. do we need to copy Text? cannot we get a reference back?
> !Screen Shot 2021-01-27 at 9.52.32 AM.png|width=652,height=280!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to