pearu opened a new pull request, #50146: URL: https://github.com/apache/arrow/pull/50146
### Rationale for this change CSV columns explicitly typed as `date32`, `date64`, `time32` or `time64` can only be parsed from strict ISO-8601 strings: `ConvertOptions::timestamp_parsers` is consulted only for `timestamp` columns. Reading e.g. `15-OCT-15` into a `date32` column fails even with `timestamp_parsers=["%d-%b-%y"]`, and `7:55:00` (non-zero-padded hour) fails for `time32[s]`. Users currently work around this by declaring such columns as `timestamp`, reading, then casting back to the date/time type. Closes #28303. ### What changes are included in this PR? - A new `DateTimeWithParsersValueDecoder` in `csv/converter.cc`, used for date32/date64/time32/time64 columns when `timestamp_parsers` is non-empty. It tries the built-in ISO-8601 parser first (preserving all existing behavior), then each configured parser in order. A timestamp produced by a fallback parser is floored to the day boundary for dates and reduced to the time of day for times, consistent with casting a timestamp to a date or time type. Values carrying a zone offset are rejected, as for zone-less timestamp columns. When no parsers are configured, the pre-existing decoder is used unchanged. - Type inference is deliberately unaffected: the Date/Time inference stages now explicitly use options with `timestamp_parsers` cleared, so inference keeps strict ISO-8601 semantics (otherwise a value with a time-of-day part could be inferred as a date and silently truncated). The existing `test_timestamp_parsers` Python test pins this behavior. - Documentation of the fallback and flooring semantics in `ConvertOptions::timestamp_parsers` (C++ and Python docstrings) and a new "Date and time parsing" section in the C++ CSV user guide. ### Are these changes tested? Yes: - New C++ tests (`Date32Conversion.UserDefinedParsers`, `Date64Conversion.UserDefinedParsers`, `Time32Conversion.UserDefinedParsers`, `Time64Conversion.UserDefinedParsers`) covering custom formats, mixed ISO + custom values in one column (backward compatibility of ISO values when parsers are set), pre-epoch flooring with a time-of-day component (distinguishes floor from truncating division), time-of-day extraction from pre-epoch timestamps, zone-offset rejection, and error cases. - New Python tests with the reproducers from #28303 and #41488, plus an inference-unchanged guard. ### Are there any user-facing changes? Yes: `ConvertOptions::timestamp_parsers` now also applies, as a fallback after ISO-8601, to columns explicitly typed as date32/date64/time32/time64 (previously such values always errored). No breaking changes: behavior without `timestamp_parsers` is untouched, ISO values keep parsing when parsers are set, and type inference is unchanged. All language bindings gain the behavior without API changes. ### AI usage disclosure This PR was developed with AI assistance (Claude Code): the decoder, tests and documentation were AI-generated under my direction, then reviewed line-by-line and iterated on by me (design decisions: fallback-after-ISO semantics, silent flooring, inference isolation, and several implementation details adjusted during review). I own and can debug these changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
