[ 
https://issues.apache.org/jira/browse/SPARK-54203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-54203:
-----------------------------
    Description: 
h2. What

Add a {{TimeType}} branch to {{RowToColumnConverter}} so rows containing TIME 
columns can be
converted to columnar (Arrow/columnar batch) format.

h2. Why (confirmed needed)

{{RowToColumnConverter.getConverterForType}} in
{{sql/core/.../execution/Columnar.scala}} handles {{LongType | TimestampType |
TimestampNTZType | DayTimeIntervalType}} (all long-backed) but has no 
{{TimeType}} case, so
a TIME column falls through to {{unsupportedDataTypeError}}. This blocks 
row-to-column
conversion paths (e.g. {{ColumnarToRow}}/{{RowToColumnar}} transitions, 
vectorized
execution) for TIME. (Supersedes the original "same change as TimestampNTZ in 
commit
47276ab9902" note - the change is required, not speculative.)

h2. Scope

* Add the {{TimeType}} case to {{getConverterForType}} using the long-backed
  ({{LongConverter}}) path, since TIME is stored as nanos-of-day {{Long}}.
* Confirm the on/off-heap column vectors already round-trip TIME (they do - 
TIME is in the
  {{longData}} group).

h2. Acceptance criteria

* A DataFrame with a TIME column survives a row<->column round-trip (e.g. 
cached columnar,
  or a plan that inserts {{RowToColumnar}}).
* Test added to {{ColumnarBatchSuite}} / the Columnar conversion tests.

  was:
Make the same change for the Time type that was made for TimestampNTZ (in 
commit [47276ab9902|https://github.com/apache/spark/commit/47276ab9902]).

I can't say whether it is needed for the Time type or not, but it was made for 
other types (including TimestampNTZ).

        Summary: Support TimeType in RowToColumnConverter  (was: Support Time 
in RowToColumnConverter)

> Support TimeType in RowToColumnConverter
> ----------------------------------------
>
>                 Key: SPARK-54203
>                 URL: https://issues.apache.org/jira/browse/SPARK-54203
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.1.0
>            Reporter: Bruce Robbins
>            Priority: Major
>
> h2. What
> Add a {{TimeType}} branch to {{RowToColumnConverter}} so rows containing TIME 
> columns can be
> converted to columnar (Arrow/columnar batch) format.
> h2. Why (confirmed needed)
> {{RowToColumnConverter.getConverterForType}} in
> {{sql/core/.../execution/Columnar.scala}} handles {{LongType | TimestampType |
> TimestampNTZType | DayTimeIntervalType}} (all long-backed) but has no 
> {{TimeType}} case, so
> a TIME column falls through to {{unsupportedDataTypeError}}. This blocks 
> row-to-column
> conversion paths (e.g. {{ColumnarToRow}}/{{RowToColumnar}} transitions, 
> vectorized
> execution) for TIME. (Supersedes the original "same change as TimestampNTZ in 
> commit
> 47276ab9902" note - the change is required, not speculative.)
> h2. Scope
> * Add the {{TimeType}} case to {{getConverterForType}} using the long-backed
>   ({{LongConverter}}) path, since TIME is stored as nanos-of-day {{Long}}.
> * Confirm the on/off-heap column vectors already round-trip TIME (they do - 
> TIME is in the
>   {{longData}} group).
> h2. Acceptance criteria
> * A DataFrame with a TIME column survives a row<->column round-trip (e.g. 
> cached columnar,
>   or a plan that inserts {{RowToColumnar}}).
> * Test added to {{ColumnarBatchSuite}} / the Columnar conversion tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to