[ 
https://issues.apache.org/jira/browse/GOBBLIN-2087?focusedWorklogId=923381&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-923381
 ]

ASF GitHub Bot logged work on GOBBLIN-2087:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Jun/24 20:31
            Start Date: 13/Jun/24 20:31
    Worklog Time Spent: 10m 
      Work Description: AndyJiang99 commented on code in PR #3972:
URL: https://github.com/apache/gobblin/pull/3972#discussion_r1638837143


##########
gobblin-iceberg/src/main/java/org/apache/gobblin/iceberg/predicates/DatasetHiveSchemaContainsNonOptionalUnion.java:
##########
@@ -42,16 +42,19 @@
 public class DatasetHiveSchemaContainsNonOptionalUnion<T extends Dataset> 
implements CheckedExceptionPredicate<T, IOException> {
   private final HiveRegister hiveRegister;
   private final Pattern pattern;
+  private final String optionalDbName;
 
   public static final String PREFIX = 
DatasetHiveSchemaContainsNonOptionalUnion.class.getName();
   /**
    * 1st match group is assumed to be the DB and the 2nd match group the Table 
for the pattern
    */
   public static final String PATTERN = PREFIX + ".db.table.pattern";
+  public static final String OPTIONAL_DB_NAME = PREFIX + ".db.optionalDbName";
 
   public DatasetHiveSchemaContainsNonOptionalUnion(Properties properties) {
     this.hiveRegister = getHiveRegister(new State(properties));
     this.pattern = Pattern.compile(properties.getProperty(PATTERN));
+    this.optionalDbName = properties.getProperty(OPTIONAL_DB_NAME, null);

Review Comment:
   Instead of fetching and handling the default with null, can we use the java 
class Optional instead? Similar to the Optional<HiveTable>  below in the 
getTable.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 923381)
    Time Spent: 20m  (was: 10m)

> Enhance DatasetHiveSchemaContainsNonOptionalUnion to Support Optional 
> Database Name
> -----------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-2087
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2087
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: pawan teja
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> **Summary:**
> The current implementation of the `DatasetHiveSchemaContainsNonOptionalUnion` 
> class requires the database name to be extracted from the dataset URN using a 
> regex pattern. This approach limits flexibility and can lead to errors if the 
> URN format changes. To enhance the flexibility and usability of this class, 
> we need to add support for an optional database name.
> **Current Issue:**
> - The database name must be extracted from the dataset URN using a regex 
> pattern.
> - This dependency on the URN format limits flexibility and can lead to errors 
> if the format changes.
> - Users cannot specify a database name directly, which could be more 
> intuitive and flexible.
> **Proposed Solution:**
> - Introduce a new property `OPTIONAL_DB_NAME` in the 
> `DatasetHiveSchemaContainsNonOptionalUnion` class.
> - Update the constructor and methods to check for the optional database name 
> and use it if provided.
> - Add logging to indicate when the optional database name is used and when it 
> replaces the pattern-extracted database name.
> - Ensure backward compatibility by retaining the existing behavior when the 
> optional database name is not provided.
> **Acceptance Criteria:**
> - The `DatasetHiveSchemaContainsNonOptionalUnion` class should support an 
> optional database name.
> - If the optional database name is provided, it should replace the database 
> name extracted from the URN pattern.
> - The class should maintain its current functionality when the optional 
> database name is not provided.
> - Appropriate logging should be added to indicate the use of the optional 
> database name.
> - Tests should be added to verify the new functionality, including cases 
> where the optional database name is and is not provided.
> These enhancements will improve the flexibility and usability of the 
> `DatasetHiveSchemaContainsNonOptionalUnion` class, allowing for more dynamic 
> database configurations and reducing dependency on the dataset URN format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to