[ 
https://issues.apache.org/jira/browse/HIVE-24535?focusedWorklogId=526951&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-526951
 ]

ASF GitHub Bot logged work on HIVE-24535:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Dec/20 20:53
            Start Date: 21/Dec/20 20:53
    Worklog Time Spent: 10m 
      Work Description: pvargacl commented on a change in pull request #1779:
URL: https://github.com/apache/hive/pull/1779#discussion_r546924170



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -994,18 +857,76 @@ public long getVisibilityTxnId() {
     public Path getBaseDirPath() {
       return baseDirPath;
     }
-    public static ParsedBase parseBase(Path path) {
+
+
+
+    public static ParsedBaseLight parseBase(Path path) {
       String filename = path.getName();
       if(!filename.startsWith(BASE_PREFIX)) {
         throw new IllegalArgumentException(filename + " does not start with " 
+ BASE_PREFIX);
       }
       int idxOfv = filename.indexOf(VISIBILITY_PREFIX);
       if(idxOfv < 0) {
-        return new 
ParsedBase(Long.parseLong(filename.substring(BASE_PREFIX.length())), path);
+        return new 
ParsedBaseLight(Long.parseLong(filename.substring(BASE_PREFIX.length())), path);
       }
-      return new 
ParsedBase(Long.parseLong(filename.substring(BASE_PREFIX.length(), idxOfv)),
+      return new 
ParsedBaseLight(Long.parseLong(filename.substring(BASE_PREFIX.length(), 
idxOfv)),
           Long.parseLong(filename.substring(idxOfv + 
VISIBILITY_PREFIX.length())), path);
     }
+
+    @Override
+    public String toString() {
+      return "Path: " + baseDirPath + "; writeId: "
+          + writeId + "; visibilityTxnId: " + visibilityTxnId;
+    }
+  }
+  /**
+   * In addition to {@link ParsedBaseLight} this knows if the data is in raw 
format, i.e. doesn't
+   * have acid metadata columns embedded in the files.  To determine this in 
some cases
+   * requires looking at the footer of the data file which can be expensive so 
if this info is
+   * not needed {@link ParsedBaseLight} should be used.
+   */
+  public static final class ParsedBase extends ParsedBaseLight {

Review comment:
       ParsedBase represents a base directory with many files. AcidBaseFileInfo 
name is rather confusing for me, but it represent any datafile that could be in 
an acid table (original, bucketfile in base, bucketfile in delta) These are the 
"base" files for orc splits.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 526951)
    Time Spent: 1h  (was: 50m)

> Cleanup AcidUtils.Directory and remove unnecessary filesystem listings
> ----------------------------------------------------------------------
>
>                 Key: HIVE-24535
>                 URL: https://issues.apache.org/jira/browse/HIVE-24535
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Peter Varga
>            Assignee: Peter Varga
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> * AcidUtils.getAcidState is doing a recursive listing on S3 FileSystem, it 
> already knows the content of each delta and base directory, this could be 
> returned to OrcInputFormat, to avoid listing each delta directory again there.
> * AcidUtils.getAcidstate submethods are collecting more and more infos about 
> the state of the data directory. This could be done directly to the final 
> Directory object to avoid 10+ parameters in methods.
> * AcidUtils.Directory, OrcInputFormat.AcidDirInfo and AcidUtils.TxnBase can 
> be merged to one class, to clean up duplications.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to