DanielCarter-stack commented on PR #10401:
URL: https://github.com/apache/seatunnel/pull/10401#issuecomment-3803443879

   <!-- code-pr-reviewer -->
   <!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10401", "part": 1, 
"total": 1} -->
   ### Issue 1: Source Document Used Sink Configuration (CRITICAL BUG)
   
   **Location**: `docs/en/connectors/source/Iceberg.md:201-249`
   
   **Modified code**:
   ```hocon
   +```hocon
   +sink {  # ← 错误:应该是 source {
   +  Iceberg {
   +    ...
   +    iceberg.table.write-props = {  # ← 错误:Source 不支持
   +      write.format.default = "parquet"
   +      write.target-file-size-bytes = 536870912
   +    }
   +    iceberg.table.primary-keys = "id"  # ← 错误:Source 不支持
   +    iceberg.table.partition-keys = "f_datetime"  # ← 错误:Source 不支持
   +    iceberg.table.upsert-mode-enabled = true  # ← 错误:Source 不支持
   +    iceberg.table.schema-evolution-enabled = true  # ← 错误:Source 不支持
   ```
   
   **Problem description**:
   The Source document incorrectly used the Sink configuration example, which 
includes multiple configuration items that are only valid for Sink.
   
   **Potential risks**:
   - Users copying the configuration to Source scenarios will cause 
configuration parsing to fail or be ignored
   - Confuses the functional differences between Source and Sink
   
   **Impact scope**:
   - **Direct impact**: All users referencing the Source document
   - **Impact surface**: Iceberg Connector Source users
   
   **Severity**: CRITICAL (Serious documentation error that will directly 
mislead users)
   
   **Improvement suggestions**:
   ```hocon
   +```hocon
   +source {
   +  Iceberg {
   +    catalog_name = "seatunnel_test"
   +    iceberg.catalog.config = {
   +      warehouse     = "s3://your-bucket/warehouse/"
   +      catalog-impl  = "org.apache.iceberg.aws.glue.GlueCatalog"
   +      io-impl       = "org.apache.iceberg.aws.s3.S3FileIO"
   +      client.region = "your-region"
   +      
   +      # Properties below must be supplied when using custom glue endpoints 
and static credentials
   +      # Class Name of the provider you want to use
   +      # We provide a StaticAwsCredentials provider that you may use.
   +      # If you need a different authentication method, you will need to 
provide
   +      # your custom implementation and ensure that it is available in 
SEATUNNEL_HOME/lib
   +      
   +      
client.credentials-provider="org.apache.seatunnel.connectors.seatunnel.iceberg.aws.StaticAwsCredentialsProvider"
   +      
   +      # Provide your AWS Access Key Id and Secret Access Key in the 
properties below
   +      client.credentials-provider.access-key-id = "YOUR_ACCESS_KEY"
   +      client.credentials-provider.secret-access-key = 
"YOUR_SECRET_ACCESS_KEY"
   +      
   +      # These additional settings may need to be set for your glue 
compatible catalog
   +      glue.endpoint = "https://hostname:port";
   +      glue.id = "YOUR_GLUE_CATALOG_ID"
   +    }
   +    namespace = "seatunnel_namespace"
   +    table     = "iceberg_source_table"
   +    case_sensitive = true
   +  }
   +}
   ```
   
   **Reasons**:
   1. Source configuration does not include Sink-specific configurations such 
as `write-props`, `primary-keys`, `partition-keys`, etc.
   2. Need to reference the existing Source document example structure
   3. Configuration examples should be concise, only showing the credential 
configuration portion
   
   ---
   
   ### Issue 2: Spelling and Grammar Errors in Documentation (MAJOR)
   
   **Location**:
   - `docs/en/connectors/sink/Iceberg.md:214-218`
   - `docs/en/connectors/source/Iceberg.md:214-218`
   
   **Modified code**:
   ```markdown
   +      # We provide a StaticAWSCredential provider that you may use.
   +      # Provide you AWS Access Key Id and Secret Access Key in the 
properties below
   +      # These additional settings may need to be set for your glue 
compatable catalog
   ```
   
   **Related issues**:
   - "StaticAWSCredential" → Should be "StaticAwsCredentials" (consistent with 
class name)
   - "Provide you AWS" → Should be "Provide your AWS"
   - "compatable" → Should be "compatible"
   
   **Problem description**:
   There are multiple spelling and grammar errors in the documentation 
comments, affecting the professionalism of the documentation.
   
   **Potential risks**:
   - Reduces documentation quality and affects user experience
   - May cause users to copy incorrect class names
   
   **Impact scope**:
   - **Direct impact**: Users reading the documentation
   - **Impact surface**: Documentation quality
   
   **Severity**: MAJOR (Documentation quality issue)
   
   **Improvement suggestions**:
   ```markdown
   +      # We provide a StaticAwsCredentials provider that you may use.
   +      # Provide your AWS Access Key Id and Secret Access Key in the 
properties below
   +      # These additional settings may need to be set for your glue 
compatible catalog
   ```
   
   **Reasons**:
   1. Maintain consistency with actual class names
   2. Correct grammar errors and improve documentation quality
   3. Prevent users from copying incorrect class names and causing 
configuration failures
   
   ---
   
   ### Issue 3: Extra Blank Lines at End of Documentation (MINOR)
   
   **Location**:
   - `docs/en/connectors/sink/Iceberg.md:247-251`
   - `docs/en/connectors/source/Iceberg.md:247-251`
   
   **Modified code**:
   ```markdown
   +      
   +      
   +      
   +      
   +      
   +    }
   ```
   
   **Problem description**:
   There are 5 extra blank lines before the `}` in the configuration example.
   
   **Potential risks**:
   - Affects documentation format and readability
   - Does not comply with documentation standards
   
   **Impact scope**:
   - **Direct impact**: Documentation format
   - **Impact surface**: Documentation quality
   
   **Severity**: MINOR (Format issue)
   
   **Improvement suggestions**:
   Delete extra blank lines to keep the documentation clean.
   
   **Reasons**:
   1. Extra blank lines are meaningless
   2. Affect documentation readability
   3. Do not comply with documentation standards
   
   ---
   
   ### Issue 4: Missing JavaDoc Comments (MAJOR)
   
   **Location**: 
`seatunnel-connectors-v2/connector-iceberg/src/main/java/org/apache/seatunnel/connectors/seatunnel/iceberg/aws/StaticAwsCredentialsProvider.java`
   
   **Modified code**:
   ```java
   package org.apache.seatunnel.connectors.seatunnel.iceberg.aws;
   
   import software.amazon.awssdk.auth.credentials.AwsBasicCredentials;
   import software.amazon.awssdk.auth.credentials.AwsCredentials;
   import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider;
   import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider;
   
   import java.util.Map;
   
   public class StaticAwsCredentialsProvider implements AwsCredentialsProvider {
       // ← Missing class-level JavaDoc
   ```
   
   **Problem description**:
   The newly added class does not have JavaDoc comments, which does not comply 
with Apache project code standards.
   
   **Potential risks**:
   - Other developers find it difficult to understand the purpose and usage of 
the class
   - IDE-generated API documentation will lack key information
   - Reduces code maintainability
   
   **Impact scope**:
   - **Direct impact**: Code maintainability
   - **Impact surface**: Single class
   
   **Severity**: MAJOR (Violation of code standards)
   
   **Improvement suggestions**:
   ```java
   /**
    * An AWS credentials provider that provides static AWS credentials from 
configuration properties.
    * 
    * <p>This class is designed to work with Iceberg's AWS integration, 
specifically for Glue Catalog
    * scenarios where you need to provide static AWS credentials that are 
different from the S3 credentials.
    * 
    * <p>Usage in Iceberg catalog configuration:
    * <pre>
    * iceberg.catalog.config = {
    *   client.credentials-provider = 
"org.apache.seatunnel.connectors.seatunnel.iceberg.aws.StaticAwsCredentialsProvider"
    *   client.credentials-provider.access-key-id = "YOUR_ACCESS_KEY"
    *   client.credentials-provider.secret-access-key = "YOUR_SECRET_ACCESS_KEY"
    * }
    * </pre>
    * 
    * <p>Note: The property keys are passed without the 
"client.credentials-provider." prefix.
    * 
    * @see software.amazon.awssdk.auth.credentials.AwsCredentialsProvider
    * @see org.apache.iceberg.aws.glue.GlueCatalog
    */
   public class StaticAwsCredentialsProvider implements AwsCredentialsProvider {
       
       /**
        * Factory method called by Iceberg to create the credentials provider.
        * 
        * <p>Iceberg removes the "client.credentials-provider." prefix from all 
properties
        * before passing them to this method. For example, 
"client.credentials-provider.access-key-id"
        * becomes "access-key-id" in the properties map.
        * 
        * @param properties Configuration properties with the prefix removed. 
Must contain
        *                   "access-key-id" and "secret-access-key" keys.
        * @return A new StaticAwsCredentialsProvider instance
        * @throws IllegalArgumentException if required properties are missing
        */
       public static StaticAwsCredentialsProvider create(Map<String, String> 
properties) {
           return new StaticAwsCredentialsProvider(properties);
       }
   ```
   
   **Reasons**:
   1. Apache projects require JavaDoc for all public APIs
   2. Clear documentation helps users understand how to use this class
   3. Explain the attribute prefix processing logic to avoid confusion
   4. Provide usage examples to reduce user learning curve
   
   ---
   
   ### Issue 5: Empty String Credentials Not Validated (MINOR)
   
   **Location**: 
`seatunnel-connectors-v2/connector-iceberg/src/main/java/org/apache/seatunnel/connectors/seatunnel/iceberg/aws/StaticAwsCredentialsProvider.java:38-42`
   
   **Modified code**:
   ```java
   String accessKey = properties.get("access-key-id");
   String secretKey = properties.get("secret-access-key");
   
   if (accessKey == null || secretKey == null) {  // ← Only checks null, does 
not check empty string
       throw new IllegalArgumentException(
               "Missing credentials. Use 
'client.credentials-provider.access-key-id' "
                       + "and 'client.credentials-provider.secret-access-key' 
in your catalog configuration.");
   }
   ```
   
   **Problem description**:
   Only checks whether credentials are null, not whether they are empty 
strings. Users may mistakenly configure them as empty strings.
   
   **Potential risks**:
   - Empty string credentials will be processed as `""` by `.trim()`, then 
passed to AWS SDK
   - AWS SDK may only throw errors when called, with unclear error messages
   - Delays error detection timing
   
   **Impact scope**:
   - **Direct impact**: Error prompt quality when configuration is wrong
   - **Impact surface**: Users using static credentials
   
   **Severity**: MINOR (Incomplete boundary condition handling)
   
   **Improvement suggestions**:
   ```java
   String accessKey = properties.get("access-key-id");
   String secretKey = properties.get("secret-access-key");
   
   if (accessKey == null || accessKey.trim().isEmpty() || 
       secretKey == null || secretKey.trim().isEmpty()) {
       throw new IllegalArgumentException(
               "Missing or empty credentials. Use 
'client.credentials-provider.access-key-id' "
                       + "and 'client.credentials-provider.secret-access-key' 
in your catalog configuration.");
   }
   
   this.delegate =
           StaticCredentialsProvider.create(
                   AwsBasicCredentials.create(accessKey.trim(), 
secretKey.trim()));
   ```
   
   **Reasons**:
   1. Detect configuration errors early
   2. Provide clearer error messages
   3. Avoid delaying errors until AWS calls
   
   ---
   
   ### Issue 6: Missing Logging (MINOR)
   
   **Location**: 
`seatunnel-connectors-v2/connector-iceberg/src/main/java/org/apache/seatunnel/connectors/seatunnel/iceberg/aws/StaticAwsCredentialsProvider.java`
   
   **Problem description**:
   The class does not have any logging, which is not conducive to debugging and 
troubleshooting.
   
   **Potential risks**:
   - Unable to track the creation process of credential providers
   - Difficult to diagnose problems when configuration errors occur
   
   **Impact scope**:
   - **Direct impact**: Observability and debuggability
   - **Impact surface**: Troubleshooting efficiency
   
   **Severity**: MINOR (Insufficient observability)
   
   **Improvement suggestions**:
   ```java
   import lombok.extern.slf4j.Slf4j;
   
   @Slf4j
   public class StaticAwsCredentialsProvider implements AwsCredentialsProvider {
       
       public static StaticAwsCredentialsProvider create(Map<String, String> 
properties) {
           log.debug("Creating StaticAwsCredentialsProvider with {} 
properties", properties.size());
           return new StaticAwsCredentialsProvider(properties);
       }
   
       private StaticAwsCredentialsProvider(Map<String, String> properties) {
           String accessKey = properties.get("access-key-id");
           String secretKey = properties.get("secret-access-key");
   
           if (accessKey == null || secretKey == null) {
               log.error("Missing credentials. access-key-id: {}, 
secret-access-key: {}", 
                        accessKey != null, secretKey != null);
               throw new IllegalArgumentException(
                       "Missing credentials. Use 
'client.credentials-provider.access-key-id' "
                               + "and 
'client.credentials-provider.secret-access-key' in your catalog 
configuration.");
           }
   
           this.delegate =
                   StaticCredentialsProvider.create(
                           AwsBasicCredentials.create(accessKey.trim(), 
secretKey.trim()));
           log.info("StaticAwsCredentialsProvider created successfully");
       }
   ```
   
   **Reasons**:
   1. DEBUG level logs help track the creation process
   2. ERROR level logs record credential missing details (without exposing 
sensitive information)
   3. INFO level logs confirm successful creation
   4. Improve observability and debuggability
   
   ---
   
   ### Issue 7: Missing E2E Tests (MAJOR)
   
   **Location**: 
`seatunnel-e2e/seatunnel-connector-v2-e2e/connector-iceberg-e2e/`
   
   **Problem description**:
   The PR Checklist requires "Add e2e testcase in seatunnel-e2e", but none were 
actually added.
   
   **Potential risks**:
   - Unable to verify integration in real environments
   - Unable to discover integration issues with other components
   - Reduced confidence in code quality
   
   **Impact scope**:
   - **Direct impact**: Test coverage and code quality assurance
   - **Impact surface**: Iceberg Connector
   
   **Severity**: MAJOR (Incomplete Checklist)
   
   **Improvement suggestions**:
   Add an E2E test class, for example:
   ```java
   // 
seatunnel-e2e/seatunnel-connector-v2-e2e/connector-iceberg-e2e/src/test/java/org/apache/seatunnel/e2e/connector/iceberg/IcebergGlueCatalogIT.java
   public class IcebergGlueCatalogIT {
       
       @Test
       public void testIcebergWithStaticCredentials() {
           // Test scenario for connecting to Glue Catalog using static 
credentials
           // Requires local Glue environment or Mock
       }
   }
   ```
   
   **Reasons**:
   1. Comply with PR Checklist requirements
   2. Verify end-to-end integration
   3. Prevent regression issues
   4. Increase code quality confidence
   
   **Note**: If E2E tests require a real AWS environment, you can use:
   - LocalStack (local AWS service simulation)
   - Testcontainers (if supports Glue)
   - Mock AWS services
   
   ---
   
   ### Issue 8: Unnecessary pom.xml Changes (CRITICAL)
   
   **Location**:
   - `seatunnel-connectors-v2/connector-jdbc/pom.xml:44`
   - `seatunnel-connectors-v2/connector-s3-redshift/pom.xml:33`
   - `seatunnel-dist/pom.xml:101`
   
   **Modified code**:
   ```xml
   -        <redshift.version>2.1.0.30</redshift.version>
   +        <redshift.version>2.1.0.9</redshift.version>
   ```
   
   **Problem description**:
   This PR is unrelated to the Redshift JDBC driver, but downgraded the 
Redshift driver version. The dev branch recently upgraded to 2.1.0.30 to fix 
OOM issues (commit 6a3fdf396).
   
   **Potential risks**:
   - Rolled back OOM fix (commit 6a3fdf396)
   - May cause users using Redshift to encounter OOM issues again
   - Violates the principle of minimal changes
   
   **Impact scope**:
   - **Direct impact**: Users using Redshift JDBC
   - **Impact surface**: JDBC Connector, S3 Redshift Connector
   - **Indirect impact**: All users depending on seatunnel-dist
   
   **Severity**: CRITICAL (Introduced serious regression risk)
   
   **Improvement suggestions**:
   **Completely remove the changes to these three pom.xml**, keep the dev 
branch version:
   ```xml
   <redshift.version>2.1.0.30</redshift.version>
   ```
   
   **Reasons**:
   1. The purpose of this PR is to add credential providers, unrelated to 
Redshift driver
   2. Downgrading will roll back OOM fixes, causing serious regressions
   3. Violates the principle of minimal changes
   4. May cause merge conflicts
   
   ---
   
   ### Issue 9: Unclear Purpose of parquet Dependency Addition (MINOR)
   
   **Location**: 
`seatunnel-e2e/seatunnel-connector-v2-e2e/connector-iceberg-e2e/pom.xml:111-121`
   
   **Modified code**:
   ```xml
   +        <dependency>
   +            <groupId>org.apache.parquet</groupId>
   +            <artifactId>parquet-avro</artifactId>
   +            <version>${parquet.version}</version>
   +            <scope>test</scope>
   +        </dependency>
   +        <dependency>
   +            <groupId>org.apache.parquet</groupId>
   +            <artifactId>parquet-hadoop</artifactId>
   +            <version>${parquet.version}</version>
   +            <scope>test</scope>
   +        </dependency>
   ```
   
   **Problem description**:
   Added parquet dependency but did not add corresponding E2E test code.
   
   **Potential risks**:
   - Unnecessary dependency increase
   - If for E2E testing, test code should be added simultaneously
   
   **Impact scope**:
   - **Direct impact**: Dependency management
   - **Impact surface**: E2E module
   
   **Severity**: MINOR (Dependency management issue)
   
   **Improvement suggestions**:
   1. If for supporting future E2E tests, test code should be added
   2. If not needed for now, these dependencies should be removed
   3. Add comments explaining why these dependencies are needed
   
   **Reasons**:
   1. Keep changes minimal
   2. Avoid unnecessary dependencies
   3. Improve change reviewability
   
   ---
   
   ### Issue 10: Risk of AWS SDK Dependency Scope as Provided (MAJOR)
   
   **Location**: `seatunnel-connectors-v2/connector-iceberg/pom.xml:82-94`
   
   **Modified code**:
   ```xml
   <dependency>
       <groupId>software.amazon.awssdk</groupId>
       <artifactId>auth</artifactId>
       <version>${software.amazon.awssdk.version}</version>
       <scope>provided</scope>
   </dependency>
   
   <dependency>
       <groupId>software.amazon.awssdk</groupId>
       <artifactId>identity-spi</artifactId>
       <version>${software.amazon.awssdk.version}</version>
       <scope>provided</scope>
   </dependency>
   ```
   
   **Problem description**:
   The newly added AWS SDK dependencies use `provided` scope, meaning they must 
be provided by the runtime environment. If users lack these dependencies in 
their classpath, the `StaticAwsCredentialsProvider` class will fail to load.
   
   **Context analysis**:
   - Existing dependencies: `glue`, `s3`, `sts` all use default scope (compile)
   - Iceberg library may transitively introduce some AWS SDK dependencies
   - But uncertain whether it includes `auth` and `identity-spi`
   
   **Potential risks**:
   - If the runtime environment lacks these dependencies, it will cause 
`NoClassDefFoundError`
   - Error messages may not be clear enough, making it difficult for users to 
troubleshoot
   - Violates the "out-of-the-box" principle
   
   **Impact scope**:
   - **Direct impact**: Users using static credential functionality
   - **Impact surface**: Iceberg Connector
   
   **Severity**: MAJOR (Potential class loading failure risk)
   
   **Improvement suggestions**:
   1. **Verify dependency transitivity**: Confirm whether Iceberg or other 
dependencies transitively introduce `auth` and `identity-spi`
   2. **Adjust scope**: If not transitively introduced, should change to 
compile scope
   3. **Add documentation**: If keeping provided scope, need to document that 
users must add these dependencies themselves
   
   **Verification method**:
   ```bash
   mvn dependency:tree -Dverbose | grep -E "(auth|identity-spi)"
   ```
   
   **Reasons**:
   1. Ensure successful class loading
   2. Provide clear error prompts
   3. Comply with "out-of-the-box" principle
   
   ---
   
   ### Issue 11: Documentation Comment Accuracy Issues (MINOR)
   
   **Location**: `docs/en/connectors/sink/Iceberg.md:211` and 
`docs/en/connectors/source/Iceberg.md:211`
   
   **Modified code**:
   ```markdown
   +      # Properties below must be supplied when using custom glue endpoints 
and static credentials
   ```
   
   **Problem description**:
   The comment says "must be supplied", but actually these properties are 
optional:
   - `client.credentials-provider`: Only needed when custom credential 
providers are required
   - `glue.endpoint` and `glue.id`: Only needed when custom endpoint or catalog 
ID is required
   
   **Potential risks**:
   - Mislead users into thinking these are required fields
   - Cause unnecessary configuration confusion
   
   **Impact scope**:
   - **Direct impact**: Documentation accuracy
   - **Impact surface**: User understanding
   
   **Severity**: MINOR (Documentation accuracy issue)
   
   **Improvement suggestions**:
   ```markdown
   +      # Optional properties for custom glue endpoints and static credentials
   +      # Use these when your Glue catalog requires different credentials 
than S3
   +      # or when you need to use a custom Glue endpoint
   ```
   
   **Reasons**:
   1. Accurately reflect the required nature of properties
   2. Help users understand when configuration is needed
   3. Improve documentation quality
   
   ---
   
   ### Issue 12: Missing Security Reminders for Credentials (MINOR)
   
   **Location**: `docs/en/connectors/sink/Iceberg.md` and 
`docs/en/connectors/source/Iceberg.md`
   
   **Problem description**:
   The documentation does not remind users to protect credential security and 
prevent credential leakage.
   
   **Potential risks**:
   - Users may commit configuration files containing credentials to version 
control systems
   - Credential leakage may lead to security incidents
   
   **Impact scope**:
   - **Direct impact**: Security
   - **Impact surface**: All users using this feature
   
   **Severity**: MINOR (Security recommendation)
   
   **Improvement suggestions**:
   Add security reminders in the documentation:
   ```markdown
   +⚠️ **Security Note**: Never commit configuration files containing 
credentials to version control.
   +Use environment variables or secret management tools to store sensitive 
information.
   +Consider using AWS IAM roles instead of static credentials when possible.
   ```
   
   **Reasons**:
   1. Raise security awareness
   2. Prevent credential leakage
   3. Guide users to use more secure solutions (such as IAM roles)
   
   ---


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to