[ 
https://issues.apache.org/jira/browse/HIVE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12450:
-----------------------------------------
    Description: 
OrcFileMergeOperator checks for compatibility before merging orc files. This 
compatibility check include checking compression buffer size. But the output 
file that is created does not honor the compression buffer size and always 
defaults to 256KB. This will not be a problem when reading the orc file but can 
create unwanted memory pressure because of wasted space within compression 
buffer.

This issue also can make the merged file unreadable under certain cases. For 
example, if the original compression buffer size is 8KB and if  
hive.exec.orc.default.buffer.size is set to 4KB. The merge file operator will 
use 4KB instead of actual 8KB which can result in hanging of ORC reader (more 
specifically ZlibCodec will wait for more compression buffers). 

  was:OrcFileMergeOperator checks for compatibility before merging orc files. 
This compatibility check include checking compression buffer size. But the 
output file that is created does not honor the compression buffer size and 
always defaults to 256KB. This will not be a problem when reading the orc file 
but can create unwanted memory pressure because of wasted space within 
compression buffer.


> OrcFileMergeOperator does not use correct compression buffer size
> -----------------------------------------------------------------
>
>                 Key: HIVE-12450
>                 URL: https://issues.apache.org/jira/browse/HIVE-12450
>             Project: Hive
>          Issue Type: Bug
>          Components: ORC
>    Affects Versions: 1.2.0, 1.3.0, 1.2.1, 2.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>            Priority: Critical
>         Attachments: HIVE-12450.1.patch
>
>
> OrcFileMergeOperator checks for compatibility before merging orc files. This 
> compatibility check include checking compression buffer size. But the output 
> file that is created does not honor the compression buffer size and always 
> defaults to 256KB. This will not be a problem when reading the orc file but 
> can create unwanted memory pressure because of wasted space within 
> compression buffer.
> This issue also can make the merged file unreadable under certain cases. For 
> example, if the original compression buffer size is 8KB and if  
> hive.exec.orc.default.buffer.size is set to 4KB. The merge file operator will 
> use 4KB instead of actual 8KB which can result in hanging of ORC reader (more 
> specifically ZlibCodec will wait for more compression buffers). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to