vinothchandar commented on a change in pull request #2064:
URL: https://github.com/apache/hudi/pull/2064#discussion_r484144517



##########
File path: 
hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java
##########
@@ -665,6 +669,10 @@ protected void completeCompaction(HoodieCommitMetadata 
metadata, JavaRDD<WriteSt
     List<HoodieWriteStat> writeStats = 
writeStatuses.map(WriteStatus::getStat).collect();
     finalizeWrite(table, compactionCommitTime, writeStats);
     LOG.info("Committing Compaction " + compactionCommitTime + ". Finished 
with result " + metadata);
+
+    // Update Metadata Table
+    HoodieMetadata.update(config, metadata, compactionCommitTime);

Review comment:
       NoTe to self: Where do updates for cleaning etc happen

##########
File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -116,6 +119,17 @@
   public static final String MAX_CONSISTENCY_CHECKS_PROP = 
"hoodie.consistency.check.max_checks";
   public static int DEFAULT_MAX_CONSISTENCY_CHECKS = 7;
 
+  // Enable the internal Metadata Table which saves file listings
+  private static final String USE_FILE_LISTING_METADATA = 
"hoodie.metadata.enable";

Review comment:
       Please match the property names with these variables here. 
   
   May be rename this to `hoodie.metadata.file.listings.enable` ? Would be more 
suitable if you intend to guard file listing alone using this

##########
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -121,6 +122,9 @@ public boolean commitStats(String instantTime, 
List<HoodieWriteStat> stats, Opti
     metadata.setOperationType(operationType);
 
     try {
+      // Update Metadata Table
+      HoodieMetadata.update(config, metadata, instantTime);

Review comment:
       Would be good to do this from a member variable as opposed to calling a 
static helper. Will leave more concrete suggestions below

##########
File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -884,6 +921,15 @@ public Builder withCompactionConfig(HoodieCompactionConfig 
compactionConfig) {
       return this;
     }
 
+    public Builder withMetadataCompactionConfig(HoodieCompactionConfig 
compactionConfig) throws IOException {
+      // Since the property names from HoodieCompactionConfig are already used 
in withCompactionConfig,

Review comment:
       Can’t we use the extra name space using `metadata` and store this as 
individual fields? That’s more extensible and easier to grok IMO

##########
File path: 
hudi-client/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java
##########
@@ -244,6 +245,10 @@ private HoodieCleanMetadata runClean(HoodieTable<?> table, 
HoodieInstant cleanIn
         inflightInstant = cleanInstant;
       }
 
+      // Update Metadata Table before even finishing clean. This ensures that 
an async clean operation in the
+      // background does not lead to stale metadata being returned from 
Metadata Table.
+      HoodieMetadata.update(config, cleanerPlan, cleanInstant.getTimestamp());

Review comment:
       Cleaning/Compaction/writing can all log updates in parallel. Do we have 
a test around this?

##########
File path: 
hudi-client/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##########
@@ -180,14 +181,14 @@ public CleanPlanner(HoodieTable<T> hoodieTable, 
HoodieWriteConfig config) {
   }
 
   /**
-   * Scan and list all paritions for cleaning.
+   * Scan and list all partitions for cleaning.
    * @return all partitions paths for the dataset.
    * @throws IOException
    */
   private List<String> getPartitionPathsForFullCleaning() throws IOException {
     // Go to brute force mode of scanning all partitions
-    return FSUtils.getAllPartitionPaths(hoodieTable.getMetaClient().getFs(), 
hoodieTable.getMetaClient().getBasePath(),
-        config.shouldAssumeDatePartitioning());
+    return 
HoodieMetadata.getAllPartitionPaths(hoodieTable.getMetaClient().getFs(),

Review comment:
       Should we control this by a flag 

##########
File path: 
hudi-client/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java
##########
@@ -230,6 +231,9 @@ protected void commit(Option<Map<String, String>> 
extraMetadata, HoodieWriteMeta
     metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, 
getSchemaToStoreInCommit());
     metadata.setOperationType(operationType);
 
+    // Update Metadata Table
+    HoodieMetadata.update(config, metadata, instantTime);

Review comment:
       Note to self: need to understand how this call is different from the 
call made in writeClient code

##########
File path: 
hudi-client/src/main/java/org/apache/hudi/metadata/HoodieMetadata.java
##########
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.metadata;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.avro.model.HoodieCleanerPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieMetadataException;
+import org.apache.hudi.exception.TableNotFoundException;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+/**
+ * Defines the interface through which file listing metadata is created, 
updated and accessed. This metadata is
+ * saved within an internal Metadata Table.
+ *
+ * If file listing metadata is disabled, the functions default to using 
listStatus(...) RPC calls to retrieve
+ * file listings from the file system.
+ */
+public class HoodieMetadata {
+  private static final Logger LOG = LogManager.getLogger(HoodieMetadata.class);
+
+  // Instances of metadata for each basePath
+  private static Map<String, HoodieMetadataImpl> instances = new HashMap<>();

Review comment:
       Unless we are doing something here across multiple base paths, we can 
move away from the static map and just deal with one basepath? We can create 
multiple instances of this class. 

##########
File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -116,6 +119,17 @@
   public static final String MAX_CONSISTENCY_CHECKS_PROP = 
"hoodie.consistency.check.max_checks";
   public static int DEFAULT_MAX_CONSISTENCY_CHECKS = 7;
 
+  // Enable the internal Metadata Table which saves file listings
+  private static final String USE_FILE_LISTING_METADATA = 
"hoodie.metadata.enable";
+  private static final String DEFAULT_USE_FILE_LISTING_METADATA = "false";
+
+  // Validate contents of Metadata Table on each access against the actual 
filesystem
+  private static final String FILE_LISTING_METADATA_VERIFY = 
"hoodie.metadata.verify";

Review comment:
       Rename to match the property above?

##########
File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -116,6 +119,17 @@
   public static final String MAX_CONSISTENCY_CHECKS_PROP = 
"hoodie.consistency.check.max_checks";
   public static int DEFAULT_MAX_CONSISTENCY_CHECKS = 7;
 
+  // Enable the internal Metadata Table which saves file listings
+  private static final String USE_FILE_LISTING_METADATA = 
"hoodie.metadata.enable";
+  private static final String DEFAULT_USE_FILE_LISTING_METADATA = "false";
+
+  // Validate contents of Metadata Table on each access against the actual 
filesystem
+  private static final String FILE_LISTING_METADATA_VERIFY = 
"hoodie.metadata.verify";
+  private static final String DEFAULT_FILE_LISTING_METADATA_VERIFY = "false";
+
+  // Serialized compaction config to be used for Metadata Table
+  public static final String HOODIE_METADATA_COMPACTION_CONFIG = 
"hoodie.metadata.compaction.config";

Review comment:
       If this is just a constant, can we move this else where? This class 
should just contain user specified configs

##########
File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -755,6 +769,29 @@ public int getBootstrapParallelism() {
     return 
Integer.parseInt(props.getProperty(HoodieBootstrapConfig.BOOTSTRAP_PARALLELISM));
   }
 
+  /**
+   * File listing metadata configs.
+   */
+  public boolean useFileListingMetadata() {
+    return Boolean.parseBoolean(props.getProperty(USE_FILE_LISTING_METADATA));
+  }
+
+  public boolean getFileListingMetadataVerify() {
+    return 
Boolean.parseBoolean(props.getProperty(FILE_LISTING_METADATA_VERIFY));
+  }
+
+  public Option<HoodieCompactionConfig> getMetadataCompactionConfig() throws 
IOException {

Review comment:
       Let’s also move this out?

##########
File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -755,6 +769,29 @@ public int getBootstrapParallelism() {
     return 
Integer.parseInt(props.getProperty(HoodieBootstrapConfig.BOOTSTRAP_PARALLELISM));
   }
 
+  /**
+   * File listing metadata configs.
+   */
+  public boolean useFileListingMetadata() {
+    return Boolean.parseBoolean(props.getProperty(USE_FILE_LISTING_METADATA));
+  }
+
+  public boolean getFileListingMetadataVerify() {
+    return 
Boolean.parseBoolean(props.getProperty(FILE_LISTING_METADATA_VERIFY));
+  }
+
+  public Option<HoodieCompactionConfig> getMetadataCompactionConfig() throws 
IOException {
+    String serializedCompactionConfig = 
props.getProperty(HOODIE_METADATA_COMPACTION_CONFIG);
+    if (serializedCompactionConfig == null) {
+      return Option.empty();

Review comment:
       If this is indeed a config then define a default value?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
##########
@@ -174,8 +174,17 @@ public static String getRelativePartitionPath(Path 
basePath, Path fullPartitionP
   /**
    * Obtain all the partition paths, that are present in this table, denoted 
by presence of
    * {@link HoodiePartitionMetadata#HOODIE_PARTITION_METAFILE}.
+   *
+   * If thee basePathStr is a subdirectory of .hoodie folder then we assume 
that the partitions of an internal

Review comment:
       Will look closely at the usage and see if we can simplify this. 

##########
File path: 
hudi-client/src/main/java/org/apache/hudi/metadata/HoodieMetadataIndex.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.metadata;

Review comment:
       These classes that are needed for reading the metadata should probably 
be in `hudi-common`, so we can use it in query enginesas well

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
##########
@@ -174,8 +174,17 @@ public static String getRelativePartitionPath(Path 
basePath, Path fullPartitionP
   /**
    * Obtain all the partition paths, that are present in this table, denoted 
by presence of
    * {@link HoodiePartitionMetadata#HOODIE_PARTITION_METAFILE}.
+   *
+   * If thee basePathStr is a subdirectory of .hoodie folder then we assume 
that the partitions of an internal

Review comment:
       Typo: thee

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/HoodieTable.java
##########
@@ -253,6 +254,15 @@ public SyncableFileSystemView getHoodieView() {
     return getViewManager().getFileSystemView(metaClient);
   }
 
+  private SyncableFileSystemView getFileSystemViewInternal(HoodieTimeline 
timeline) {
+    if (config.useFileListingMetadata()) {
+      FileSystemViewStorageConfig viewConfig = config.getViewStorageConfig();
+      return new HoodieMetadataFileSystemView(metaClient, timeline, 
viewConfig.isIncrementalTimelineSyncEnabled());
+    } else {
+      return getViewManager().getFileSystemView(metaClient);

Review comment:
       Can we make the viewManager hand the metadataFSView as well

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java
##########
@@ -202,23 +202,21 @@ public void scan() {
                     LOG.info("Rolling back the last corrupted log block read 
in " + logFile.getPath());
                     currentInstantLogBlocks.pop();
                     numBlocksRolledBack++;
-                  } else if (lastBlock.getBlockType() != CORRUPT_BLOCK

Review comment:
       Can you explain what’s changing here in terms of corrupt block handling




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to