[ https://issues.apache.org/jira/browse/HIVE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759624#comment-13759624 ]
Phabricator commented on HIVE-5102: ----------------------------------- ashutoshc has requested changes to the revision "HIVE-5102 [jira] ORC getSplits should create splits based the stripes". Looks good overall. Some minor comments. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java:276 Didn't get why we need to & 0xff here? ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:247 I think it will good to throw exception if dirs.isEmpty() because we spent a day debugging a problem where Tez updated code but didnt have this config variable in their conf. ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:261 There are few places in Hive codebase which creates thread pool. We should unify all that. But thats probably topic for another jira. ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:69 It will good to a comment for this field. ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:262 Good to name this field splits. Also, instead should it be List<FileSplit> ? ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:288 Why should we allow this? Isn't caller passing in wrong argument in those cases? ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:329 This seems like introducing a possibility of process hang. Is there a better way of doing things here? ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:523 Add @Override annotation? ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java:285-288 This change shouldn't have made this test to fail. Any reason for deleting this? REVISION DETAIL https://reviews.facebook.net/D12579 BRANCH h-5102 ARCANIST PROJECT hive To: JIRA, ashutoshc, omalley > ORC getSplits should create splits based the stripes > ----------------------------------------------------- > > Key: HIVE-5102 > URL: https://issues.apache.org/jira/browse/HIVE-5102 > Project: Hive > Issue Type: Bug > Components: File Formats > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Attachments: HIVE-5102.D12579.1.patch, HIVE-5102.D12579.2.patch > > > Currently ORC inherits getSplits from FileFormat, which basically makes a > split per an HDFS block. This can create too little parallelism and would be > better done by having getSplits look at the file footer and create splits > based on the stripes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira