laskoviymishka commented on code in PR #1081:
URL: https://github.com/apache/iceberg-go/pull/1081#discussion_r3249578131
##########
table/arrow_utils.go:
##########
@@ -342,10 +342,19 @@ func (c convertToIceberg) Primitive(dt arrow.DataType)
(result iceberg.NestedFie
panic(fmt.Errorf("%w: unsupported arrow type for
conversion - %s", iceberg.ErrInvalidSchema, dt))
}
case *arrow.TimestampType:
- if dt.Unit == arrow.Nanosecond {
- if !c.downcastTimestamp {
- panic(fmt.Errorf("%w: 'ns' timestamp precision
not supported", iceberg.ErrType))
+ if dt.Unit == arrow.Nanosecond && !c.downcastTimestamp {
+ if slices.Contains(utcAliases, dt.TimeZone) {
+ result.Type =
iceberg.PrimitiveTypes.TimestampTzNs
+ } else if dt.TimeZone == "" {
+ result.Type = iceberg.PrimitiveTypes.TimestampNs
+ } else {
Review Comment:
worth a line in the PR body that UTC-tz and empty-tz nanoseconds with
`downcastTimestamp=false` used to panic (`'ns' timestamp precision not
supported`) and now map to `TimestampTzNs` / `TimestampNs`. correct behavior
for v3, but anyone bisecting #695 down the line will appreciate the explicit
note that this is a behavior change for v1/v2 callers — they'll now produce a
v3 schema and trip on `minFormatVersionForType` instead of failing at
conversion. small enough to fold in here.
##########
table/arrow_utils_test.go:
##########
@@ -76,11 +76,11 @@ func TestArrowToIceberg(t *testing.T) {
{arrow.FixedWidthTypes.Timestamp_s,
iceberg.PrimitiveTypes.TimestampTz, false, ""},
{arrow.FixedWidthTypes.Timestamp_ms,
iceberg.PrimitiveTypes.TimestampTz, false, ""},
{arrow.FixedWidthTypes.Timestamp_us,
iceberg.PrimitiveTypes.TimestampTz, true, ""},
- {arrow.FixedWidthTypes.Timestamp_ns, nil, false, "'ns'
timestamp precision not supported"},
+ {arrow.FixedWidthTypes.Timestamp_ns,
iceberg.PrimitiveTypes.TimestampTzNs, true, ""},
{&arrow.TimestampType{Unit: arrow.Second},
iceberg.PrimitiveTypes.Timestamp, false, ""},
{&arrow.TimestampType{Unit: arrow.Millisecond},
iceberg.PrimitiveTypes.Timestamp, false, ""},
{&arrow.TimestampType{Unit: arrow.Microsecond},
iceberg.PrimitiveTypes.Timestamp, true, ""},
- {&arrow.TimestampType{Unit: arrow.Nanosecond}, nil, false,
"'ns' timestamp precision not supported"},
+ {&arrow.TimestampType{Unit: arrow.Nanosecond},
iceberg.PrimitiveTypes.TimestampNs, true, ""},
Review Comment:
while we're here, could we also add a non-UTC tz nanosecond row to lock in
the new panic branch? something like:
```go
{&arrow.TimestampType{Unit: arrow.Nanosecond, TimeZone: "US/Pacific"}, nil,
false, "unsupported arrow type for conversion - timestamp[ns, tz=US/Pacific]"},
```
right under the existing `us, tz=US/Pacific` panic row. the loop always
calls `ArrowTypeToIceberg(tt.dt, false)` (downcastTimestamp=false), so this
exercises the third sub-case of the new branch — the panic at line 348 — which
the two added rows don't cover. small enough to fold into this PR, wdyt?
##########
table/arrow_utils.go:
##########
@@ -342,10 +342,19 @@ func (c convertToIceberg) Primitive(dt arrow.DataType)
(result iceberg.NestedFie
panic(fmt.Errorf("%w: unsupported arrow type for
conversion - %s", iceberg.ErrInvalidSchema, dt))
}
case *arrow.TimestampType:
- if dt.Unit == arrow.Nanosecond {
- if !c.downcastTimestamp {
- panic(fmt.Errorf("%w: 'ns' timestamp precision
not supported", iceberg.ErrType))
+ if dt.Unit == arrow.Nanosecond && !c.downcastTimestamp {
+ if slices.Contains(utcAliases, dt.TimeZone) {
+ result.Type =
iceberg.PrimitiveTypes.TimestampTzNs
+ } else if dt.TimeZone == "" {
+ result.Type = iceberg.PrimitiveTypes.TimestampNs
+ } else {
+ panic(fmt.Errorf("%w: unsupported arrow type
for conversion - %s", iceberg.ErrInvalidSchema, dt))
}
+
+ return result
Review Comment:
tiny readability nit while we're here — `if dt.Unit == arrow.Nanosecond`
shows up twice in a row with an early `return` in between. could fold into one
block to keep the two ns paths adjacent:
```go
if dt.Unit == arrow.Nanosecond {
if !c.downcastTimestamp {
switch {
case slices.Contains(utcAliases, dt.TimeZone):
result.Type = iceberg.PrimitiveTypes.TimestampTzNs
case dt.TimeZone == "":
result.Type = iceberg.PrimitiveTypes.TimestampNs
default:
panic(fmt.Errorf("%w: unsupported arrow type for conversion -
%s", iceberg.ErrInvalidSchema, dt))
}
return result
}
slog.Warn("downcasting nanosecond timestamp to microsecond, precision
loss may occur")
}
```
equivalent semantics, take it or leave it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]