maxcountryman opened a new issue, #1920: URL: https://github.com/apache/arrow-rs/issues/1920
**Describe the bug** I'm unable to persist [fields represented as e.g. `Timestamp` in Arrow](https://github.com/maxcountryman/warc-parquet/blob/4aeddceca7d856b98eee8c8410adf3c1e841ae96/src/schema.rs#L10) to recognized timestamps in the written Parquet. **To Reproduce** I've written [a simple utility](https://github.com/maxcountryman/warc-parquet) for converting WARC files to Parquet. Using this, you'll produce Parquet which looks something like this: ``` message arrow_schema { required binary id (STRING); required int32 content_length (INTEGER(32,false)); required int64 date; required binary type (STRING); optional binary content_type (STRING); optional binary concurrent_to (STRING); optional binary block_digest (STRING); optional binary payload_digest (STRING); optional binary ip_address (STRING); optional binary refers_to (STRING); optional binary target_uri (STRING); optional binary truncated (STRING); optional binary warc_info_id (STRING); optional binary filename (STRING); optional binary profile (STRING); optional binary identified_payload_type (STRING); optional int32 segment_number (INTEGER(32,false)); optional binary segment_origin_id (STRING); optional int32 segment_total_length (INTEGER(32,false)); optional binary body; } ``` **Expected behavior** Looking at Parquet produced from a sample datasets (of NYC taxi data), their Parquet has the correctly annotated `TIMESTAMP`: ``` message schema { optional binary hvfhs_license_num (STRING); optional binary dispatching_base_num (STRING); optional binary originating_base_num (STRING); optional int64 request_datetime (TIMESTAMP(MICROS,false)); optional int64 on_scene_datetime (TIMESTAMP(MICROS,false)); optional int64 pickup_datetime (TIMESTAMP(MICROS,false)); optional int64 dropoff_datetime (TIMESTAMP(MICROS,false)); optional int64 PULocationID; optional int64 DOLocationID; optional double trip_miles; optional int64 trip_time; optional double base_passenger_fare; optional double tolls; optional double bcf; optional double sales_tax; optional double congestion_surcharge; optional double airport_fee; optional double tips; optional double driver_pay; optional binary shared_request_flag (STRING); optional binary shared_match_flag (STRING); optional binary access_a_ride_flag (STRING); optional binary wav_request_flag (STRING); optional binary wav_match_flag (STRING); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
