jhrotko commented on PR #892:
URL: https://github.com/apache/arrow-java/pull/892#issuecomment-3501493422

   # Design Analysis: Extension Type Writer Implementation
   
   ## Executive Summary
   
   This PR represents a significant architectural improvement in how Arrow Java 
handles extension type writers. The change moves from a **factory-based 
pattern** to a **type-based pattern**, where the `ArrowType.ExtensionType` 
itself provides the writer implementation. This simplifies the API, improves 
extensibility, and makes it easier to implement custom extension types outside 
of arrow-java.
   
   ---
   
   ## Problem Statement
   
   ### The Original Challenge
   
   In Arrow's type system, each `MinorType` (INT, FLOAT, VARCHAR, etc.) has a 
corresponding writer implementation. However, **extension types** are special:
   
   - They all share the same `MinorType.EXTENSIONTYPE`
   - But each extension type (UUID, Opaque, custom types) needs a **different 
writer implementation**
   - The system needed a way to map from a specific extension type to its writer
   
   ### Previous Solution: Factory Pattern
   
   The original implementation (commits `34060eb4`, `7a7e4edd`, `7fe36d70`, 
`8663ffc6`) used an `ExtensionTypeWriterFactory` pattern:
   
   ```java
   // Old approach - Factory pattern
   public interface ExtensionTypeWriterFactory {
       FieldWriter createWriter(ValueVector vector);
   }
   
   // Usage in ComplexCopier
   writer.addExtensionTypeWriterFactory(extensionTypeWriterFactory);
   writer.writeExtension(value);
   ```
   
   **How it worked:**
   1. Each extension type had a separate factory class (e.g., 
`UuidWriterFactory`)
   2. The factory was passed around as a parameter
   3. Writers stored factories and used them to create the appropriate writer
   4. TransferPair implementations also needed to carry factories
   
   ---
   
   ## Problems with the Factory Pattern
   
   ### 1. **Scalability Issues**
   
   - Every new extension type required creating a new factory class
   - Factories had to be passed through multiple layers of the API
   - TransferPair implementations became complex with factory parameters
   
   ### 2. **External Extension Types**
   
   For developers implementing extension types **outside of arrow-java**:
   
   ```java
   // Developer needs to create TWO classes:
   class MyCustomType extends ExtensionType { ... }
   class MyCustomWriterFactory implements ExtensionTypeWriterFactory { ... }
   
   // And manage both separately
   ```
   
   This created unnecessary complexity and coupling.
   
   ### 3. **API Complexity**
   
   Methods proliferated with factory parameters:
   
   ```java
   // Old API
   ComplexCopier.copy(reader, writer, extensionTypeWriterFactory);
   writer.addExtensionTypeWriterFactory(factory);
   TransferPair.makeTransferPair(target, factory);
   ```
   
   ### 4. **Tight Coupling**
   
   The factory pattern created tight coupling between:
   - The type definition
   - The writer implementation  
   - The factory that connects them
   - All the code that needs to pass factories around
   
   ---
   
   ## New Solution: Type-Based Pattern
   
   ### Core Design Change
   
   Add a single abstract method to `ArrowType.ExtensionType`:
   
   ```java
   public abstract class ExtensionType extends ArrowType {
       // NEW METHOD
       public abstract FieldWriter getNewFieldWriter(ValueVector vector);
       
       // Existing methods
       public abstract ArrowType storageType();
       public abstract String extensionName();
       public abstract FieldVector getNewVector(...);
       // ...
   }
   ```
   
   ### Implementation Example
   
   ```java
   public class UuidType extends ExtensionType {
       @Override
       public FieldWriter getNewFieldWriter(ValueVector vector) {
           return new UuidWriterImpl((UuidVector) vector);
       }
       
       // Other methods...
   }
   ```
   
   ---
   
   ## Benefits of the New Design
   
   ### 1. **Simplicity**
   
   - **One class per extension type** instead of two (type + factory)
   - The type itself knows how to create its writer
   - Natural object-oriented design
   
   ### 2. **Cleaner API**
   
   ```java
   // New API - no factory parameters needed
   ComplexCopier.copy(reader, writer);
   writer.writeExtension(value, type);
   
   // The type provides the writer internally
   FieldWriter writer = extensionType.getNewFieldWriter(vector);
   ```
   
   ### 3. **Better Extensibility**
   
   External developers only need to implement one class:
   
   ```java
   public class MyCustomType extends ExtensionType {
       @Override
       public FieldWriter getNewFieldWriter(ValueVector vector) {
           return new MyCustomWriterImpl((MyCustomVector) vector);
       }
       
       // Implement other required methods...
   }
   ```
   
   ### 4. **Consistency with Existing Patterns**
   
   This mirrors how `MinorType` already works:
   
   ```java
   // MinorType enum (existing pattern)
   public enum MinorType {
       INT(new Int(...)) {
           @Override
           public FieldWriter getNewFieldWriter(ValueVector vector) {
               return new IntWriterImpl((IntVector) vector);
           }
       },
       // ...
   }
   
   // ExtensionType (new pattern - same approach)
   public class UuidType extends ExtensionType {
       @Override
       public FieldWriter getNewFieldWriter(ValueVector vector) {
           return new UuidWriterImpl((UuidVector) vector);
       }
   }
   ```
   
   ### 5. **Decoupling**
   
   - Writers no longer need to store or manage factories
   - TransferPair implementations simplified
   - Type information flows naturally through the `ArrowType` object
   
   ---
   
   ## Key Changes in the Codebase
   
   ### 1. **ArrowType.ExtensionType** (Core Change)
   
   ```java
   // Added abstract method
   public abstract FieldWriter getNewFieldWriter(ValueVector vector);
   ```
   
   ### 2. **Extension Type Implementations**
   
   **UuidType:**
   ```java
   @Override
   public FieldWriter getNewFieldWriter(ValueVector vector) {
       return new UuidWriterImpl((UuidVector) vector);
   }
   ```
   
   **OpaqueType:**
   ```java
   @Override
   public FieldWriter getNewFieldWriter(ValueVector vector) {
       throw new UnsupportedOperationException("WriterImpl not yet 
implemented.");
   }
   ```
   
   ### 3. **UnionWriter** (Now Supports Extension Types)
   
   ```java
   // OLD: Threw UnsupportedOperationException
   private ExtensionWriter getExtensionWriter(ArrowType arrowType) {
       throw new UnsupportedOperationException("ExtensionTypes are not 
supported yet.");
   }
   
   // NEW: Uses type to get writer
   private Map<ArrowType, ExtensionWriter> extensionWriters = new HashMap<>();
   
   private ExtensionWriter getExtensionWriter(ArrowType arrowType) {
       ExtensionWriter w = extensionWriters.get(arrowType);
       if (w == null) {
           w = ((ExtensionType) 
arrowType).getNewFieldWriter(data.getExtension(arrowType));
           w.setPosition(idx());
           extensionWriters.put(arrowType, w);
       }
       return w;
   }
   ```
   
   ### 4. **ComplexCopier** (Simplified)
   
   ```java
   // OLD: Required factory parameter
   case EXTENSIONTYPE:
       if (extensionTypeWriterFactory == null) {
           throw new IllegalArgumentException("Must provide 
ExtensionTypeWriterFactory");
       }
       writer.addExtensionTypeWriterFactory(extensionTypeWriterFactory);
       writer.writeExtension(value);
       break;
   
   // NEW: Type provides writer
   case EXTENSIONTYPE:
       if (reader.isSet()) {
           Object value = reader.readObject();
           if (value != null) {
               writer.writeExtension(value);
           }
       }
       break;
   ```
   
   ### 5. **PromotableWriter** (Simplified)
   
   ```java
   // OLD: Used UnionExtensionWriter
   case EXTENSIONTYPE:
       writer = new UnionExtensionWriter((ExtensionTypeVector) vector);
       break;
   
   // NEW: Type provides writer
   case EXTENSIONTYPE:
       writer = ((ExtensionType) 
vector.getField().getType()).getNewFieldWriter(vector);
       break;
   ```
   
   ---
   
   ## Migration Impact
   
   ### For Arrow Java Developers
   
   - **Removed:** `ExtensionTypeWriterFactory` interface and implementations
   - **Removed:** Factory parameters from APIs
   - **Added:** `getNewFieldWriter()` method to extension types
   
   ### For External Extension Type Developers
   
   **Before:**
   ```java
   // Two classes needed
   class MyType extends ExtensionType { ... }
   class MyWriterFactory implements ExtensionTypeWriterFactory { ... }
   ```
   
   **After:**
   ```java
   // One class needed
   class MyType extends ExtensionType {
       @Override
       public FieldWriter getNewFieldWriter(ValueVector vector) {
           return new MyWriterImpl((MyVector) vector);
       }
   }
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to