khandelwal-prateek commented on code in PR #4069:
URL: https://github.com/apache/gobblin/pull/4069#discussion_r1809763385


##########
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/hive/HiveDatasetFinderTest.java:
##########
@@ -215,6 +215,32 @@ public void testDatasetConfig() throws Exception {
 
   }
 
+  @Test
+  public void testHiveTableFolderFilter() throws Exception {
+    List<HiveDatasetFinder.DbAndTable> dbAndTables = Lists.newArrayList();
+    dbAndTables.add(new HiveDatasetFinder.DbAndTable("db1", "table1"));
+    HiveMetastoreClientPool pool = getTestPool(dbAndTables);
+
+    Properties properties = new Properties();
+    properties.put(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + 
WhitelistBlacklist.WHITELIST, "");
+    // Try a regex with multiple groups
+    properties.put(HiveDatasetFinder.TABLE_FOLDER_ALLOWLIST_FILTER, 
"(/tmp/|a).*");
+
+    HiveDatasetFinder finder = new 
TestHiveDatasetFinder(FileSystem.getLocal(new Configuration()), properties, 
pool);
+    List<HiveDataset> datasets = 
Lists.newArrayList(finder.getDatasetsIterator());
+
+    Assert.assertEquals(datasets.size(), 1);
+
+    properties.put(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + 
WhitelistBlacklist.WHITELIST, "");
+    // The table located at /tmp/test should be filtered

Review Comment:
   it would be helpful to call out that the dataset table is created at path 
`/tmp/test` on line 221 since there is another assertion above using a 
different regex which doesn't filter the table 



##########
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/hive/HiveDatasetFinderTest.java:
##########
@@ -215,6 +215,32 @@ public void testDatasetConfig() throws Exception {
 
   }
 
+  @Test
+  public void testHiveTableFolderFilter() throws Exception {

Review Comment:
   can be renamed to `testHiveTableFolderAllowlistFilter`



##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/hive/HiveDatasetFinder.java:
##########
@@ -294,6 +304,12 @@ protected HiveDataset computeNext() {
     };
   }
 
+  protected static boolean shouldAllowTableLocation(Optional<String> regex, 
Table table) {
+    if (!regex.isPresent()) {
+      return true;
+    }
+    return 
Pattern.compile(regex.get()).matcher(table.getSd().getLocation()).matches();

Review Comment:
   Pattern.compile(regex.get()) is called every time the method 
shouldAllowTableLocation is executed when the regex is present.  Compiling a 
regex every time can be inefficient, we can compile the pattern once and store 
it, eg.:
   `private static Pattern compiledAllowlistPattern = 
regex.map(Pattern::compile).orElse(null);`



##########
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/hive/HiveDatasetFinderTest.java:
##########
@@ -215,6 +215,32 @@ public void testDatasetConfig() throws Exception {
 
   }
 
+  @Test
+  public void testHiveTableFolderFilter() throws Exception {
+    List<HiveDatasetFinder.DbAndTable> dbAndTables = Lists.newArrayList();
+    dbAndTables.add(new HiveDatasetFinder.DbAndTable("db1", "table1"));
+    HiveMetastoreClientPool pool = getTestPool(dbAndTables);
+
+    Properties properties = new Properties();
+    properties.put(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + 
WhitelistBlacklist.WHITELIST, "");
+    // Try a regex with multiple groups
+    properties.put(HiveDatasetFinder.TABLE_FOLDER_ALLOWLIST_FILTER, 
"(/tmp/|a).*");
+
+    HiveDatasetFinder finder = new 
TestHiveDatasetFinder(FileSystem.getLocal(new Configuration()), properties, 
pool);
+    List<HiveDataset> datasets = 
Lists.newArrayList(finder.getDatasetsIterator());
+
+    Assert.assertEquals(datasets.size(), 1);
+
+    properties.put(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + 
WhitelistBlacklist.WHITELIST, "");
+    // The table located at /tmp/test should be filtered
+    properties.put(HiveDatasetFinder.TABLE_FOLDER_ALLOWLIST_FILTER, "/a/b");
+
+    finder = new TestHiveDatasetFinder(FileSystem.getLocal(new 
Configuration()), properties, pool);
+    datasets = Lists.newArrayList(finder.getDatasetsIterator());
+
+    Assert.assertEquals(datasets.size(), 0);

Review Comment:
   can also add a test for the case where the regex is empty or null



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to