[ 
https://issues.apache.org/jira/browse/HIVE-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507998#comment-15507998
 ] 

Siddharth Seth commented on HIVE-14800:
---------------------------------------

They are valid splits - however, it should be possible to make them consistent 
when splits are generated by ORC itself. Either special case BI or ETL to 
generate the same split as the other for the starting split of a file.

In terms of hashCode for consistent splits - that should be independent of the 
format.

> Handle off by 3 in ORC split generation based on split strategy used
> --------------------------------------------------------------------
>
>                 Key: HIVE-14800
>                 URL: https://issues.apache.org/jira/browse/HIVE-14800
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>
> BI will apparently generate splits starting at offset 0.
> ETL will skip the ORC header and generate a split starting at offset 3.
> There's a workaround in the HiveSplitGenreator to handle this for consistent 
> splits. Ideally, Orc split generation should take care of this.
> cc [~prasanth_j], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to