[ 
https://issues.apache.org/jira/browse/HIVE-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341610#comment-16341610
 ] 

Jason Dere commented on HIVE-18252:
-----------------------------------

[~ashutoshc] I tried to look at this again and tried to remove the caching for 
complex object inspectors. Saw the following error:

{noformat}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Conflict on row 
inspector for srcpart
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.initOperatorContext(MapOperator.java:473)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:457)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:286)
        ... 16 more
{noformat}

Is it safe to remove the check that is being done in 
MapOperator.initOperator()? Or do we need to implement equals() for the object 
inspectors?

Also noticed that ObjectInspectorConverters does object inspector comparison, 
so if the object inspectors do not compare as equal Hive might potentially end 
up creating unnecessary conversions as opposed to just using the 
IdentityConverter.

Wondering if this means that we do have to implement equals() for the object 
inspectors.

> Limit the size of the object inspector caches
> ---------------------------------------------
>
>                 Key: HIVE-18252
>                 URL: https://issues.apache.org/jira/browse/HIVE-18252
>             Project: Hive
>          Issue Type: Bug
>          Components: Types
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>            Priority: Major
>         Attachments: HIVE-18252.1.patch
>
>
> Was running some tests that had a lot of queries with constant values, and 
> noticed that ObjectInspectorFactory.cachedStandardStructObjectInspector 
> started using up a lot of memory.
> It appears that StructObjectInspector caching does not work properly with 
> constant values. Constant ObjectInspectors are not cached, so each constant 
> expression creates a new constant ObjectInspector. And since object 
> inspectors do not override equals(), object inspector comparison relies on 
> object instance comparison. So even if the values are exactly the same as 
> what is already in the cache, the StructObjectInspector cache lookup would 
> fail, and Hive would create a new object inspector and add it to the cache, 
> creating another entry that would never be used. Plus, there is no max cache 
> size - it's just a map that is allowed to grow as long as values keep getting 
> added to it.
> Some possible solutions I can think of:
> 1. Limit the size of the object inspector caches, rather than growing without 
> bound.
> 2. Try to fix the caching to work with constant values. This would require 
> implementing equals() on the constant object inspectors (which could be slow 
> in nested cases), or else we would have to start caching constant object 
> inspectors, which could be expensive in terms of memory usage. Could be used 
> in combination with (1). By itself this is not a great solution because this 
> still has the unbounded cache growth issue.
> 3. Disable caching in the case of constant object inspectors since this 
> scenario currently doesn't work. This could be used in combination with (1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to