[ 
https://issues.apache.org/jira/browse/HIVE-28609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-28609:
--------------------------------
    Description: HIVE-28530 introduces a ThreadLocal for storing files in 
HiveSequenceFileInputFormat because there was a contention while accessing the 
files in a shared/cached instance. I feel we fixed a problem in a bad place. 
Instead of preventing this instance from being cached, it introduced a 
ThreadLocal, which seems weird and hacky and makes the code reader think that 
the input format instance must be cached, whereas it's not. This format class 
is instantiated by 
[reflection|https://github.com/apache/hive/blob/18f34e75da0141d37d9a8f1cef4f7f64ba09fadb/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java#L229],
 which is quite often cached due to performance reasons. We can still cache an 
instance and clone it (maybe by implementing some interface) to keep 
performance.  (was: HIVE-28530 introduces a ThreadLocal for storing files in 
HiveSequenceFileInputFormat because there was a contention while accessing the 
files in a shared/cached instance. I feel we fixed a problem in a bad place. 
Instead of preventing this instance from being cached, it introduced a 
ThreadLocal, which seems weird and hacky and makes the code reader think that 
the input format instance must be cached, whereas it's not. This format class 
is instantiated through reflection, which is quite often cached due to 
performance reasons. We can still cache the instance and clone it (maybe by 
implementing some interface) to keep performance.)

> HiveSequenceFileInputFormat should be cloned or not be cached
> -------------------------------------------------------------
>
>                 Key: HIVE-28609
>                 URL: https://issues.apache.org/jira/browse/HIVE-28609
>             Project: Hive
>          Issue Type: Improvement
>      Security Level: Public(Viewable by anyone) 
>            Reporter: László Bodor
>            Priority: Major
>
> HIVE-28530 introduces a ThreadLocal for storing files in 
> HiveSequenceFileInputFormat because there was a contention while accessing 
> the files in a shared/cached instance. I feel we fixed a problem in a bad 
> place. Instead of preventing this instance from being cached, it introduced a 
> ThreadLocal, which seems weird and hacky and makes the code reader think that 
> the input format instance must be cached, whereas it's not. This format class 
> is instantiated by 
> [reflection|https://github.com/apache/hive/blob/18f34e75da0141d37d9a8f1cef4f7f64ba09fadb/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java#L229],
>  which is quite often cached due to performance reasons. We can still cache 
> an instance and clone it (maybe by implementing some interface) to keep 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to