[ 
https://issues.apache.org/jira/browse/HIVE-24535?focusedWorklogId=526952&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-526952
 ]

ASF GitHub Bot logged work on HIVE-24535:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Dec/20 21:02
            Start Date: 21/Dec/20 21:02
    Worklog Time Spent: 10m 
      Work Description: pvargacl commented on a change in pull request #1779:
URL: https://github.com/apache/hive/pull/1779#discussion_r546927637



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -1253,19 +1190,12 @@ public static ParsedDelta parsedDelta(Path deltaDir, 
String deltaPrefix, FileSys
       ParsedDelta p = parsedDelta(deltaDir, isRawFormat);
       List<HdfsFileStatusWithId> files = null;
       if (dirSnapshot != null) {
+        final PathFilter filter = isRawFormat ? AcidUtils.originalBucketFilter 
: AcidUtils.bucketFileFilter;
+        // If we already know the files, store it for future use
         files = dirSnapshot.getFiles().stream()
-            .filter(fileStatus -> 
bucketFileFilter.accept(fileStatus.getPath()))
+            .filter(fileStatus -> filter.accept(fileStatus.getPath()))
             .map(HdfsFileStatusWithoutId::new)
             .collect(Collectors.toList());
-      } else if (isDeleteDelta) {
-        // For delete deltas we need the files for AcidState
-        try {
-          files = SHIMS.listLocatedHdfsStatus(fs, deltaDir, bucketFileFilter);
-        } catch (UnsupportedOperationException uoe) {
-          files = Arrays.stream(fs.listStatus(deltaDir, bucketFileFilter))
-              .map(HdfsFileStatusWithoutId::new)
-              .collect(Collectors.toList());
-        }

Review comment:
       For delete delta folders the metadata for all files will be send to the 
execution side, so the FileId could be used to retrieve the orctail - and with 
the new feature of Adam - the whole file from llap cache. This was here to 
ensure the file list is propagated even in HDFS where we don't have anything in 
directory snapshot. Now this functionality is hidden inside 
ParsedDelta#getFiles it will either return the list from cache or do the 
FileSystem call itself if it was not done before




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 526952)
    Time Spent: 1h 10m  (was: 1h)

> Cleanup AcidUtils.Directory and remove unnecessary filesystem listings
> ----------------------------------------------------------------------
>
>                 Key: HIVE-24535
>                 URL: https://issues.apache.org/jira/browse/HIVE-24535
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Peter Varga
>            Assignee: Peter Varga
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> * AcidUtils.getAcidState is doing a recursive listing on S3 FileSystem, it 
> already knows the content of each delta and base directory, this could be 
> returned to OrcInputFormat, to avoid listing each delta directory again there.
> * AcidUtils.getAcidstate submethods are collecting more and more infos about 
> the state of the data directory. This could be done directly to the final 
> Directory object to avoid 10+ parameters in methods.
> * AcidUtils.Directory, OrcInputFormat.AcidDirInfo and AcidUtils.TxnBase can 
> be merged to one class, to clean up duplications.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to