[PR] [SEDONA-729] Add _metadata hidden column support for shapefile DataSource V2 reader [sedona]

via GitHub Sat, 14 Feb 2026 23:09:52 -0800


jiayuasu opened a new pull request, #2653:
URL: https://github.com/apache/sedona/pull/2653


   ## Did you read the Contributor Guide?
   
   - Yes, I have read the [Contributor 
Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor 
Developer Guide](https://sedona.apache.org/latest/community/develop/)
   
   ## Is this PR related to a ticket?
   
   - Yes, and the PR name follows the format `[SEDONA-XXX] my subject`.
   
   This PR fixes [SEDONA-729](https://issues.apache.org/jira/browse/SEDONA-729).
   
   ## What changes were proposed in this PR?
   
   When reading shapefiles via the DataSource V2 API, the standard `_metadata` 
hidden column (containing `file_path`, `file_name`, `file_size`, 
`file_block_start`, `file_block_length`, `file_modification_time`) was missing 
from the DataFrame. This is because `ShapefileTable` did not implement Spark's 
`SupportsMetadataColumns` interface.
   
   This PR implements `_metadata` support across all four Spark version modules 
(3.4, 3.5, 4.0, 4.1) by modifying four source files per module:
   
   1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. 
**Sh1. **Sh1. **Sh1. da1. **Sh1. **Sh1. **Sh1. **Sh1. **Shld st1. **Sh1. **Sh1. 
**Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. 
da1. **Sh1. **Sh1. **Sh1. **Sh1. **Shld st1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. 
**S�� A1. **Sh1. **Sh1. **Sh1. **Sh1. **ete1. **Sh1. **Sh1. **Sh1. **Sh1. 
**Sh1. **Sh1. **Sh1. **Sh1. **Sh1.ut sc1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. 
**Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sart1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. 
**data values (path, name, size, block offset/length, modification time) from 
the `.shp` `PartitionedFile`, and wraps the base reader in a 
`PartitionReaderWithMetadata` that joins data rows with metadata using 
`JoinedRow` + `GenerateUnsafeProjection`. Correctly handles Spark's struct 
pruning by building only the requested sub-fields.
   
   
AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAftee")AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAftee")AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAftee")AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAftee")AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAftee")AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAftee")AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfteERE`
 clauses
   - Projection: `_metadata` fields can be selected alongside data columns
   
   All tests pass on all four Spark versions:
   - spark-3.4 (Scala 2.12): 53 tests passed
   - spark-3.5 (Scala 2.12): 33 tests passed
   - spark-4.0 (Scala 2.13): 33 tests passed
   - spark-4.1 (Scala 2.13): 33 tests passed
   
   ## Did this PR include necessary documentation updates?
   
   - No, this PR does not affect any public API so no need to change the 
documentation. The `_metadata` column is a standard Spark hidden column that is 
automatically available to users — no Sedona-specific API changes are 
introduced.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [SEDONA-729] Add _metadata hidden column support for shapefile DataSource V2 reader [sedona]

Reply via email to