[GitHub] [accumulo] ctubbsii commented on a diff in pull request #3228: Fix for #2872. Makes exporttable command volume aware.

via GitHub Wed, 02 Aug 2023 01:13:39 -0700


ctubbsii commented on code in PR #3228:
URL: https://github.com/apache/accumulo/pull/3228#discussion_r1281555487



##########
test/src/main/java/org/apache/accumulo/test/ExportTableCommandWithMultipleVolumesIT.java:
##########
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.test;
+
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+import java.io.File;
+import java.time.Duration;
+import java.util.Map;
+import java.util.SortedSet;
+import java.util.TreeSet;
+
+import org.apache.accumulo.core.client.Accumulo;
+import org.apache.accumulo.core.client.AccumuloClient;
+import org.apache.accumulo.core.client.BatchWriter;
+import org.apache.accumulo.core.client.Scanner;
+import org.apache.accumulo.core.conf.Property;
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Mutation;
+import org.apache.accumulo.core.data.Range;
+import org.apache.accumulo.core.data.Value;
+import org.apache.accumulo.core.metadata.MetadataTable;
+import org.apache.accumulo.core.metadata.schema.MetadataSchema;
+import org.apache.accumulo.core.security.Authorizations;
+import org.apache.accumulo.harness.AccumuloClusterHarness;
+import org.apache.accumulo.miniclusterImpl.MiniAccumuloConfigImpl;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.RawLocalFileSystem;
+import org.apache.hadoop.io.Text;
+import org.junit.jupiter.api.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class ExportTableCommandWithMultipleVolumesIT extends 
AccumuloClusterHarness {
+  private static final Logger log =
+      LoggerFactory.getLogger(ExportTableCommandWithMultipleVolumesIT.class);
+
+  Path v1, v2;
+
+  public static String[] row_numbers = "1,2,3,4,5,6,7,8,9,10".split(",");
+
+  String baseDirStr = "";
+  String baseDir2Str = "";
+  String originalVolume = "";
+  String secondVolume = "";
+
+  @Override
+  protected Duration defaultTimeout() {
+    return Duration.ofMinutes(1);
+  }
+
+  @Override
+  public void configureMiniCluster(MiniAccumuloConfigImpl cfg, Configuration 
hadoopCoreSite) {
+    File baseDir = cfg.getDir();
+
+    // get first volume name
+    baseDirStr = baseDir.toString();
+    String[] baseDirArray = baseDirStr.split("/");
+    originalVolume = baseDirArray[2];
+
+    // get second volume name
+    String[] baseDir2Array = baseDirArray;
+    baseDir2Array[2] = baseDir2Array[2] + "2";

Review Comment:
   I'm not sure if you're saying that the code in this PR is creating separate 
distcp files whose names collide (which shouldn't happen... this PR should 
create uniquely named files if it's creating more than one as the proposed 
solution), or if you're saying that all the local files look like they're part 
of the same volume and end up in the same file because all files in `file:///` 
looks like a single filesystem. I think I would need specific file names and 
resulting file contents, as an example, to understand what you're saying.
   
   However, I've noticed another problem with the current implementation. The 
current implementation seems to be looping over the currently configured 
volumes and writing out separate files for each of those volumes configured, if 
it sees files that belong in that volume. However... what about files in the 
metadata that are not part of any currently configured volume? The number, 
name, and organization of the files should be based on the volumes represented 
in the absolute paths of the files seen in the metadata table... not just 
grouped by whether they match one of the currently configured volumes. The 
current implementation looks like it would just ignore those files.
   
   Without diving in to implement this myself, I'm not exactly sure what the 
expected behavior is for `file:///` paths, but the basic functionality I would 
expect the following set of mock files seen to have 3 separate groups, each 
with 2 files:
   
   ```
   group1:
   hdfs://nn1:8020/accumulo/path/to/table1/tablet1/file1.rf
   hdfs://nn1:8020/accumulo/path/to/table1/tablet2/file1.rf
   
   group2:
   hdfs://nn1:8021/accumulo/path/to/table1/tablet3/file1.rf
   hdfs://nn1:8021/accumulo/path/to/table1/tablet4/file1.rf
   
   group3:
   hdfs://nn3:8020/accumulo/path/to/table1/tablet5/file1.rf
   hdfs://nn3:8020/accumulo/path/to/table1/tablet6/file1.rf
   ```
   
   This should be true, even if only `hdfs://nn1:8020` were currently 
configured in `instance.volumes`, and `nn2` and `nn3` were previous volumes 
that are being decommissioned, but data still exists on them. This should also 
account for the volume replacements.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo] ctubbsii commented on a diff in pull request #3228: Fix for #2872. Makes exporttable command volume aware.

Reply via email to