andygrove opened a new issue, #3241:
URL: https://github.com/apache/datafusion-comet/issues/3241

   ## Summary
   
   Move classes that are exclusively used by Iceberg (not used by any 
non-`@IcebergApi` Comet code) from `common` into the new `iceberg-public-api` 
module. This will improve separation of concerns and make the Iceberg API 
boundary clearer.
   
   ## Classes to Move
   
   The following 4 classes are **only** referenced by other `@IcebergApi` 
classes and exist purely for Iceberg integration:
   
   | Class | Current Location | Purpose |
   |-------|------------------|---------|
   | `ParquetColumnSpec` | `common/src/main/java/org/apache/comet/parquet/` | 
Bridge class to pass column descriptors between Comet and Iceberg (since 
Iceberg shades Parquet) |
   | `RowGroupReader` | `common/src/main/java/org/apache/comet/parquet/` | 
PageReadStore implementation for row group reading |
   | `WrappedInputFile` | `common/src/main/java/org/apache/comet/parquet/` | 
Wraps Iceberg's InputFile to implement Parquet's InputFile interface |
   | `AbstractCometSchemaImporter` | `common/src/main/java/org/apache/arrow/c/` 
| Base class for schema import functionality |
   
   ## Important Considerations
   
   ### Packaging/Shading Concerns
   
   ⚠️ **This change requires careful attention to packaging and shading:**
   
   1. **Iceberg shades Parquet** - `ParquetColumnSpec` exists specifically 
because Iceberg uses shaded Parquet classes. Any module structure changes must 
preserve this compatibility.
   
   2. **JAR structure** - Ensure the classes remain accessible to Iceberg after 
the move. The final packaged JAR must include these classes in a location 
Iceberg can access.
   
   3. **Maven shade plugin configuration** - Review and update shade plugin 
configuration if needed to ensure:
      - Classes are included in the shaded JAR
      - Package relocations don't break Iceberg's access
      - No duplicate classes across modules
   
   4. **Dependency direction** - The `iceberg-public-api` module currently only 
contains tests. If we move source classes there, we need to:
      - Update module dependencies
      - Ensure `common` module can still depend on these classes if needed
      - Consider if a separate `iceberg-api` module (non-test) is more 
appropriate
   
   ### Testing Requirements
   
   - [ ] Verify Iceberg integration tests pass after the move
   - [ ] Verify shaded JAR contains all necessary classes
   - [ ] Test with actual Iceberg to ensure no classloader issues
   - [ ] Run `iceberg-public-api` module tests
   
   ## Alternative Approaches
   
   1. **Keep in `common`, just document** - Leave classes in `common` but 
document they're Iceberg-specific
   2. **New `iceberg-api` source module** - Create a separate source module 
(not test-only) for Iceberg API classes
   3. **Subpackage organization** - Move to `org.apache.comet.iceberg.*` 
package within `common`
   
   ## Related
   
   - PR #3240 - Added `iceberg-public-api` test module
   - PR #3237 - Added `@IcebergApi` annotation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to