[ 
https://issues.apache.org/jira/browse/HIVE-24774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284656#comment-17284656
 ] 

Peter Varga commented on HIVE-24774:
------------------------------------

Hi [~rajesh.balamohan] some of this was solved in 
[https://github.com/apache/hive/pull/1893/files] 
The file list is reused and I removed the isDirectory call most of the cases. I 
think the FileSystem calls can not be avoided 100% because, even if we would 
keep the FileStatus object list we would need to call FS.getFileChecksum anyway.
Also check out this PR: [https://github.com/apache/hive/pull/1971]
It is not yet committed, but it will fully avoid any listing during 
loadPartition in case of direct insert.

> Reduce FS listing during dynamic partition loading
> --------------------------------------------------
>
>                 Key: HIVE-24774
>                 URL: https://issues.apache.org/jira/browse/HIVE-24774
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>            Reporter: Rajesh Balamohan
>            Priority: Major
>
> When loading large number of partitions in cloud storage, notification log 
> takes lot longer time to list newly added files.
> It would be good to explore if FileStatus can be reused from 
> Hive::listFilesCreatedByQuery or from copyFiles
> {noformat}
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3031)
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4171)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3566)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.addWriteNotificationLog(Hive.java:3519)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.addWriteNotificationLog(Hive.java:3504)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:2984)
>       at 
> org.apache.hadoop.hive.ql.exec.MoveTask.handleDynParts(MoveTask.java:562)
>       at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:440)
>       at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>       at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>       at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
>       at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>       at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>       at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:730)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:490)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:484)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to