[ 
https://issues.apache.org/jira/browse/HIVE-14633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449148#comment-15449148
 ] 

Abhishek Somani commented on HIVE-14633:
----------------------------------------

I think number of mappers can be controlled via other means like split size 
configurations, using CombineHiveInputFormat etc. Is this a Tez usecase? Tez 
does split grouping as well which should lead to lesser mappers.

> #.of Files in a partition ! = #.Of buckets in a partitioned,bucketed table
> --------------------------------------------------------------------------
>
>                 Key: HIVE-14633
>                 URL: https://issues.apache.org/jira/browse/HIVE-14633
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.2.1
>         Environment: HDP 2.3.2
>            Reporter: Hanu
>
> Ideally the number of files should be equal to number of buckets declared in 
> a table DDL. It is working fine whenever an initial insert or every insert 
> overwrite is performed. But, insert into hive bucketed table is creating 
> extra files. 
> ex:
> # of Buckets = 4
> No. of files after Initial insert --> 4
> No. of files after 2nd insert --> 8
> No. of files after 3rd insert --> 12
> No. of files after n insert --> n* # of Buckets.
> First insert list : 
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000000_0
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000001_0
> -rwxrwxrwx   3 hvallur hdfs        308 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000002_0
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000003_0
> 2nd Insert:
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000000_0
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:47 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000000_0_copy_1
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000001_0
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:47 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000001_0_copy_1
> -rwxrwxrwx   3 hvallur hdfs        308 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000002_0
> -rwxrwxrwx   3 hvallur hdfs        302 2016-08-25 12:47 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000002_0_copy_1
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000003_0
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:47 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000003_0_copy_1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to