> On July 29, 2016, 11:33 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, lines 1840-1841
> > <https://reviews.apache.org/r/50359/diff/5/?file=1456814#file1456814line1840>
> >
> >     I don't follow this. Comment doesn't seem to match code. 
> >     FileSystem.rename() should automatically do copy+delete for S3. So, why 
> > do we need to do that explictly?
> >     Per your comment, you want to delete temp dir, but that should already 
> > be handled in Context::clear()
> >     Per your code, you are deleting preexisting files on target dir but as 
> > I said that should already be handled in fs.rename()

Yes, FileSystem.rename() is handleding the copy+delete for S3. However, for the 
INSERT OVERWRITE case, the temporary directory that contains 000000_0 also 
contains a .hive-staging directory that is also copied to S3. This 
.hive-staging directory should be deleted automatically on HDFS by the 
deleteOnExit() call, but when this directory is copied to S3, then this 
deleteOnExit flag is not copied, so the data is kept on S3.

I thought I could point statsTmpLoc to a different location instead. Then I 
found another place where another temporary directory in .hive-staging is 
created too. So, instead on fixing these 2, I thought that maybe this can be 
handled this way by doing it explicitly. Other developers may use 
getExtTmpPathRelTo() in the future again and they will add more temp data to 
.hive-staging, so I just wanted to prevent copying unwanted files to S3.

What do you think?


> On July 29, 2016, 11:33 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 1807
> > <https://reviews.apache.org/r/50359/diff/5/?file=1456814#file1456814line1807>
> >
> >     I am not sure if this change is really needed. But, if it does, won't 
> > be need equivalent in loadPartition() & loadDynamicPartitions().

Thanks.
I'll take a look at this.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144220
-----------------------------------------------------------


On July 28, 2016, 8:11 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated July 28, 2016, 8:11 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> e92466f172c81fce20fe951df58f6561d28dc215 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> ec5d693d28a40925c44f844a05ebf3f5c10173c9 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                
>                            3.651s
> - insert into table s3dummy values (1);                                       
>                           39.231s
> - insert overwrite table s3dummy values (1);                                  
>                           42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                   
>                           45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                       
>                           15.025s
> - insert overwrite table s3dummy values (1);                                  
>                           25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                         
>                           25.469s      
> - from dummy insert into table s3dummy select *;                              
>                           14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                   
>                           16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';              
>                            3.176s
> - alter table s3dummypart add partition (part=1);                             
>                            3.229s
> - alter table s3dummypart add partition (part=2);                             
>                            3.124s
> - insert into table s3dummypart partition (part=1) values (1);                
>                           14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);           
>                           27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;  
>                           14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 
>                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 
>                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Reply via email to