[ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
----------------------------------
    Description: 
Insert statements create files of format ending with 0000_0, 0001_0 etc. 
However, the load data uses the input file name. That results in inconsistent 
naming convention which makes SMB joins difficult in some scenarios and may 
cause trouble for other types of queries in future.

We need consistent naming convention.


For non-bucketed table, hive renames all the files regardless of how they were 
named by the user.
For bucketed table, hive relies on user to name the files matching the bucket 
in non-strict mode. Hive assumes that the data belongs to same bucket in a 
file. In strict mode, loading bucketed table is disabled.


  was:
Insert statements create files of format ending with 0000_0, 0001_0 etc. 
However, the load data uses the input file name. That results in inconsistent 
naming convention which makes SMB joins difficult in some scenarios and may 
cause trouble for other types of queries in future.

We need consistent naming convention.


> load data should rename files consistent with insert statements
> ---------------------------------------------------------------
>
>                 Key: HIVE-18350
>                 URL: https://issues.apache.org/jira/browse/HIVE-18350
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Deepak Jaiswal
>            Assignee: Deepak Jaiswal
>
> Insert statements create files of format ending with 0000_0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to