[ https://issues.apache.org/jira/browse/HIVE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900021#comment-14900021 ]
Illya Yalovyy commented on HIVE-11882: -------------------------------------- SimpleFetchOptimizer.FetchData.calculateLength() doesn't respect the threshold when calculates sizes of partitions: {code:java} for (Partition partition : partsList.getNotDeniedPartns()) { Path path = partition.getDataLocation(); total += getFileLength(jobConf, path, partition.getInputFormatClass()); } {code} I'll work on it next week. > Fetch optimizer should stop source files traversal once it exceeds the > hive.fetch.task.conversion.threshold > ----------------------------------------------------------------------------------------------------------- > > Key: HIVE-11882 > URL: https://issues.apache.org/jira/browse/HIVE-11882 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer > Affects Versions: 1.0.0 > Reporter: Illya Yalovyy > > Hive 1.0's fetch optimizer tries to optimize queries of the form "select <C> > from <T> where <F> limit <L>" to a fetch task (see the > hive.fetch.task.conversion property). This optimization gets the lengths of > all the files in the specified partition and does some comparison against a > threshold value to determine whether it should use a fetch task or not (see > the hive.fetch.task.conversion.threshold property). This process of getting > the length of all files. One of the main problems in this optimization is the > fetch optimizer doesn't seem to stop once it exceeds the > hive.fetch.task.conversion.threshold. It works fine on HDFS, but could cause > a significant performance degradation on other supported file systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)