[ 
https://issues.apache.org/jira/browse/HIVE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760410#comment-13760410
 ] 

Phabricator commented on HIVE-5102:
-----------------------------------

omalley has commented on the revision "HIVE-5102 [jira] ORC getSplits should 
create splits based the stripes".

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:247 +1
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:261 Yes, but 
you'd need to carefully consider the impact of globalizing the thread pool.
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:262 We could 
do either. The final return type is InputSplit[], so this seemed more natural.
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:288 I should 
comment, but I used this for testing. In the test code, it is useful to be able 
to refer to -1 to get the last split added.
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:329 This is 
pretty standard for cases where there isn't a natural timeout. The only way for 
the threads to not finish is if HDFS is hanging, which would cause the previous 
code to hang forever.
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java:276 To handle 
the case where the byte is between 0x80 and 0xff. Natural promotion would 
convert those to negative numbers, which isn't desired.
  
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java:285-288 
This patch changes the desired behavior. In the previous version, the orc file 
with 0 rows still created a single input split, which is what this test was 
testing. With the modified test, we ensure that we get no splits, which is the 
new expected behavior.
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:69 +1

REVISION DETAIL
  https://reviews.facebook.net/D12579

BRANCH
  h-5102

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, omalley

                
> ORC getSplits should create splits based the stripes 
> -----------------------------------------------------
>
>                 Key: HIVE-5102
>                 URL: https://issues.apache.org/jira/browse/HIVE-5102
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-5102.D12579.1.patch, HIVE-5102.D12579.2.patch
>
>
> Currently ORC inherits getSplits from FileFormat, which basically makes a 
> split per an HDFS block. This can create too little parallelism and would be 
> better done by having getSplits look at the file footer and create splits 
> based on the stripes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to