jbs-atolcd opened a new pull request, #6186: URL: https://github.com/apache/hop/pull/6186
… This commit introduces significant improvements to the Parquet Input and Output transforms by implementing comprehensive support for Parquet's Logical Types. Previously, the transforms relied primarily on primitive types, leading to conversions issues and errors with data when handling complex types, such as Timestamps.. Key Changes & Features: 1. Parquet Input: * Logical Type Mapping: Refactors the field discovery to use `LogicalTypeAnnotation` (instead of only primitive type), enabling correct mapping for semantic types. * Timestamp/Date Precision: Implements a conversion mechanism to map Parquet's timestamps units (MILLIS, MICROS..) to Hop's `TYPE_TIMESTAMP` and `TYPE_DATE`, preserving precision and handling UTC adjustments. * JSON Support: Adds explicit support for the JSON Logical Type, converting the Parquet binary/string data into Hop's `TYPE_JSON` object. * Decimal Handling: Uses precision and scale from `DecimalLogicalTypeAnnotation` to correctly convert binary/long Parquet decimals into Hop's `TYPE_BIGNUMBER`. 2. Parquet Output: * Date/Timestamp Consistency: Ensures that Hop's `TYPE_DATE` and `TYPE_TIMESTAMP` are consistently converted to a `LONG` representation with the Parquet `timestampMillis` logical annotation, which is the most compatible format. * Schema Mapping: Maps Hop's `TYPE_JSON` and `TYPE_UUID` to Parquet `STRING` types in the schema definition. Testing and Validation: * Test Data Enrichment: The test dataset (`golden-parquet-input.json`) was extended to include new fields: `isActive` (Boolean), `registrationTimestamp` (Timestamp), and `metadataJson` (JSON), ensuring the new types are covered end-to-end. * Unit Test Update: The unit test configuration (`0029-parquet-input UNIT.json`) was updated to map and validate the new fields, confirming the correct functionality of the transform. This resolves a major limitation regarding data fidelity when dealing with common modern Parquet schemas. **Please** add a meaningful description for your change here ------------------------ Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Run `mvn clean install apache-rat:check` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If you have a group of commits related to the same change, please squash your commits into one and force push your branch using `git rebase -i`. - [x] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. To make clear that you license your contribution under the [Apache License Version 2.0, January 2004](http://www.apache.org/licenses/LICENSE-2.0) you have to acknowledge this by using the following check-box. - [x] I hereby declare this contribution to be licensed under the [Apache License Version 2.0, January 2004](http://www.apache.org/licenses/LICENSE-2.0) - [ ] In any other case, please file an [Apache Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
