[ 
https://issues.apache.org/jira/browse/SPARK-57572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090324#comment-18090324
 ] 

Shrirang Mhalgi commented on SPARK-57572:
-----------------------------------------

Hi [~maxgekk] I'd like to work on this one if it's available.

I'll follow the existing DateType/TimestampType inference pattern in 
CSVInferSchema and JsonInferSchema, gate it behind spark.sql.timeType.enabled, 
and add tests in the inference suites. Let me know if you'd prefer to handle it 
yourself or if there are any dependencies I should be aware of.

> Infer the TIME type during CSV and JSON schema inference
> --------------------------------------------------------
>
>                 Key: SPARK-57572
>                 URL: https://issues.apache.org/jira/browse/SPARK-57572
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>
> h2. What
> Infer {{TimeType}} for time-only string values during CSV and JSON schema 
> inference (when no
> explicit schema is provided), analogous to the existing Date/Timestamp 
> inference.
> h2. Gap
> {{CSVInferSchema}} (and {{JsonInferSchema}}) infer 
> DateType/TimestampNTZType/TimestampType but
> never {{TimeType}}. With an explicit schema, TIME read/write already works; 
> only auto-inference
> is missing.
> h2. Scope
> * Add a tryParseTime step to the inference type ladder, ordered to avoid 
> regressions (a
>   time-only string like 12:13:14 must not be misclassified, and existing 
> date/timestamp
>   inference must be unchanged).
> * Consider gating behind the timeFormat option / a config and the 
> spark.sql.timeType.enabled
>   flag, given inference ambiguity.
> h2. Acceptance criteria
> * CSV/JSON files with time-only columns infer TimeType (when enabled); 
> existing inference is
>   unchanged.
> * Tests in CSVInferSchemaSuite / JsonInferSchemaSuite and the CSV/JSON file 
> suites.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to