AlgoDeveloper400 opened a new pull request, #3161:
URL: https://github.com/apache/iceberg-python/pull/3161

   # fix: handle Windows drive letters in `parse_location`
   
   ## Rationale for this change
   
   When a Windows user passes a local file path like `C:\Users\file.avro` to 
`PyArrowFileIO`,
   Python's `urlparse` incorrectly treats the Windows drive letter `C` as a URL 
scheme (like `s3` or `http`).
   
   This caused PyIceberg to crash with:
   ```
   Unrecognized filesystem type in URI: 'c'
   ```
   
   ---
   
   ## The Fix
   
   **Before ❌ (Original Code):**
   ```python
   uri = urlparse(location)
   
   if not uri.scheme:
       default_scheme = properties.get("DEFAULT_SCHEME", "file")
       default_netloc = properties.get("DEFAULT_NETLOC", "")
       return default_scheme, default_netloc, os.path.abspath(location)
   ```
   
   **After ✅ (Fixed Code):**
   ```python
   uri = urlparse(location)
   
   if not uri.scheme or (len(uri.scheme) == 1 and uri.scheme.isalpha()):
       # len == 1 and isalpha() catches Windows drive letters like C:\ D:\
       default_scheme = properties.get("DEFAULT_SCHEME", "file")
       default_netloc = properties.get("DEFAULT_NETLOC", "")
       return default_scheme, default_netloc, os.path.abspath(location)
   ```
   
   **The only change:**
   ```python
   # Before ❌
   if not uri.scheme:
   
   # After ✅
   if not uri.scheme or (len(uri.scheme) == 1 and uri.scheme.isalpha()):
   ```
   
   The added condition checks if the scheme is a **single alphabetic 
character** (e.g. `C`, `D`, `E`)
   and treats it as a Windows drive letter instead of a URL scheme.
   
   ---
   
   ## Example
   
   ```python
   from pyiceberg.io.pyarrow import PyArrowFileIO
   
   io = PyArrowFileIO()
   
   # Before fix - crashed with: Unrecognized filesystem type in URI: 'c'
   # After fix - works correctly
   scheme, netloc, path = io.parse_location("C:\\Users\\test\\file.avro")
   
   print(scheme)  # 'file'
   print(netloc)  # ''
   print(path)    # 'C:\\Users\\test\\file.avro'
   ```
   
   ---
   
   ## Impact
   
   This fix affects all local file operations on Windows including:
   - Reading local Iceberg tables
   - Writing local Iceberg tables
   - Any local Avro/Parquet file operations
   
   ---
   
   ## Are these changes tested?
   
   Yes - existing tests now pass on Windows.
   
   **`tests/test_avro_sanitization.py`**
   ```
   python -m pytest tests/test_avro_sanitization.py -v
   ```
   ```
   tests/test_avro_sanitization.py::test_comprehensive_field_name_sanitization  
PASSED
   tests/test_avro_sanitization.py::test_comprehensive_avro_compatibility       
 PASSED
   tests/test_avro_sanitization.py::test_emoji_field_name_sanitization          
 PASSED
   ```
   
   **`tests/io/test_pyarrow.py`**
   ```
   python -m pytest 
tests/io/test_pyarrow.py::test_pyarrow_infer_local_fs_from_path -v
   ```
   ```
   tests/io/test_pyarrow.py::test_pyarrow_infer_local_fs_from_path              
 PASSED
   ```
   
   ---
   
   ## Are there any user-facing changes?
   
   Yes - fixes local file access on Windows for all PyIceberg users.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to