[ 
https://issues.apache.org/jira/browse/HIVE-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755453#comment-16755453
 ] 

Eugene Koifman edited comment on HIVE-21177 at 1/29/19 10:54 PM:
-----------------------------------------------------------------

I added checks so that we don't look for the side file if we don't have to.

We have another issue.  Operations like Load Data/Add Partition, create 
base/delta and place 'raw' (aka 'original' schema) files there.  Split gen and 
read path need to know what schema to expect in a given file/split.  There is 
nothing in the file path that indicates what it is so it opens one of the data 
files in base/delta to determine that: {{AcidUtils.isRawFormat()}}.

This should be less of an issue, since it does a listing first to choose the 
file, so it should never be looking for a file that is not actually there.  I 
optimized isRawFormat() some but it will do the checks a lot of the time.  It 
could be changed to rely on the file name instead but that's rather fragile.




was (Author: ekoifman):
I added checks so that we don't look for the side file if we don't have to.

We have another issue.  Operations like Load Data/Add Partition, create 
base/delta and place 'raw' (aka 'original' schema) files there.  Split gen and 
read path need to know what schema to expect in a given file/split.  There is 
nothing in the file path that indicates what it is so it opens one of the data 
files in base/delta to determine that: {{AcidUtils.isRawFormat()}}.

This should be less of an issue, since it does a listing first to choose the 
file, so it should never be looking for a file that is not actually there.  I 
optimized isRawFormat() some but it will do the checks a lot of the time.  It 
could be changed to rely of file name instead but that's rather fragile.



> Optimize AcidUtils.getLogicalLength()
> -------------------------------------
>
>                 Key: HIVE-21177
>                 URL: https://issues.apache.org/jira/browse/HIVE-21177
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.0.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>            Priority: Major
>         Attachments: HIVE-21177.01.patch
>
>
> {{AcidUtils.getLogicalLength()}} - tries look for the side file 
> {{OrcAcidUtils.getSideFile()}} on the file system even when the file couldn't 
> possibly be there, e.g. when the path is delta_x_x or base_x.  It could only 
> be there in delta_x_y, x != y.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to