jiayuasu opened a new pull request, #2653: URL: https://github.com/apache/sedona/pull/2653
## Did you read the Contributor Guide? - Yes, I have read the [Contributor Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor Developer Guide](https://sedona.apache.org/latest/community/develop/) ## Is this PR related to a ticket? - Yes, and the PR name follows the format `[SEDONA-XXX] my subject`. This PR fixes [SEDONA-729](https://issues.apache.org/jira/browse/SEDONA-729). ## What changes were proposed in this PR? When reading shapefiles via the DataSource V2 API, the standard `_metadata` hidden column (containing `file_path`, `file_name`, `file_size`, `file_block_start`, `file_block_length`, `file_modification_time`) was missing from the DataFrame. This is because `ShapefileTable` did not implement Spark's `SupportsMetadataColumns` interface. This PR implements `_metadata` support across all four Spark version modules (3.4, 3.5, 4.0, 4.1) by modifying four source files per module: 1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. da1. **Sh1. **Sh1. **Sh1. **Sh1. **Shld st1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. da1. **Sh1. **Sh1. **Sh1. **Sh1. **Shld st1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **S�� A1. **Sh1. **Sh1. **Sh1. **Sh1. **ete1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1.ut sc1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **Sart1. **Sh1. **Sh1. **Sh1. **Sh1. **Sh1. **data values (path, name, size, block offset/length, modification time) from the `.shp` `PartitionedFile`, and wraps the base reader in a `PartitionReaderWithMetadata` that joins data rows with metadata using `JoinedRow` + `GenerateUnsafeProjection`. Correctly handles Spark's struct pruning by building only the requested sub-fields. AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAftee")AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAftee")AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAftee")AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAftee")AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAftee")AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAftee")AfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfterAfteERE` clauses - Projection: `_metadata` fields can be selected alongside data columns All tests pass on all four Spark versions: - spark-3.4 (Scala 2.12): 53 tests passed - spark-3.5 (Scala 2.12): 33 tests passed - spark-4.0 (Scala 2.13): 33 tests passed - spark-4.1 (Scala 2.13): 33 tests passed ## Did this PR include necessary documentation updates? - No, this PR does not affect any public API so no need to change the documentation. The `_metadata` column is a standard Spark hidden column that is automatically available to users — no Sedona-specific API changes are introduced. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
