Dynamic partition insert performance problem
--------------------------------------------

                 Key: HIVE-2087
                 URL: https://issues.apache.org/jira/browse/HIVE-2087
             Project: Hive
          Issue Type: Bug
          Components: Metastore
    Affects Versions: 0.7.0
         Environment: Amazon EMR, S3
            Reporter: Q Long


Create an external(backed by S3) table T, make it partitioned by column P. 
Populate table T so it has large number of partitions (say 100). Execute 
statement like

insert overwrite table T partition (p) select * from another_table

check hive server log, and it will show that all existing partitions will be 
read and loaded before any mapper starts working. This feels excessive, given 
that the insert statement may only create or overwrite a very small number of 
partitions. Is there other reason that insert using dynamic partition requires 
loading the whole table?



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to