[ 
https://issues.apache.org/jira/browse/HIVE-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870088#action_12870088
 ] 

Ning Zhang commented on HIVE-1348:
----------------------------------

Yongqiang, I still think we should just call the function inputFileChanged() or 
checkInputFileChanged(). The reasons are:
 1) input file change could change record over record. So the only case you 
want to cache it is inside the processOp() itself, not cross rows. So caching 
the variable in ExecMapperContext doesn't make sense since it is shared across 
all operators and all iterations. Also making the variable inputFileChanged 
public and let other class change it doesn't make sense and could easily lead 
incorrect result. 
 2) The only use of the variable ExecMapperContext.inputFileChanged is in 
SMBMapJoinOperator.procssOp() and close(). It doesn't need to be cached in 
ExecMapperContext (or even in a local class variable) at all IMO. Can you 
explain why?


> Moving inputFileChanged() from ExecMapper to where it is needed
> ---------------------------------------------------------------
>
>                 Key: HIVE-1348
>                 URL: https://issues.apache.org/jira/browse/HIVE-1348
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Assignee: He Yongqiang
>         Attachments: hive-1348.1.patch, hive-1348.2.patch, hive-1348.3.patch
>
>
> inputFileChanged() is only needed for Bucketed sort merge map join. It should 
> not be put in ExecMapper.map() where all code paths will hit this function. 
> This function is quite expensive since JobConf look up is a hash table look 
> up. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to