laskoviymishka commented on code in PR #1081:
URL: https://github.com/apache/iceberg-go/pull/1081#discussion_r3249588987


##########
table/arrow_utils_test.go:
##########
@@ -76,11 +76,11 @@ func TestArrowToIceberg(t *testing.T) {
                {arrow.FixedWidthTypes.Timestamp_s, 
iceberg.PrimitiveTypes.TimestampTz, false, ""},
                {arrow.FixedWidthTypes.Timestamp_ms, 
iceberg.PrimitiveTypes.TimestampTz, false, ""},
                {arrow.FixedWidthTypes.Timestamp_us, 
iceberg.PrimitiveTypes.TimestampTz, true, ""},
-               {arrow.FixedWidthTypes.Timestamp_ns, nil, false, "'ns' 
timestamp precision not supported"},
+               {arrow.FixedWidthTypes.Timestamp_ns, 
iceberg.PrimitiveTypes.TimestampTzNs, true, ""},
                {&arrow.TimestampType{Unit: arrow.Second}, 
iceberg.PrimitiveTypes.Timestamp, false, ""},
                {&arrow.TimestampType{Unit: arrow.Millisecond}, 
iceberg.PrimitiveTypes.Timestamp, false, ""},
                {&arrow.TimestampType{Unit: arrow.Microsecond}, 
iceberg.PrimitiveTypes.Timestamp, true, ""},
-               {&arrow.TimestampType{Unit: arrow.Nanosecond}, nil, false, 
"'ns' timestamp precision not supported"},
+               {&arrow.TimestampType{Unit: arrow.Nanosecond}, 
iceberg.PrimitiveTypes.TimestampNs, true, ""},

Review Comment:
   while we're here, could we also add a non-UTC tz nanosecond row to lock in 
the new panic branch? something like:
   
   ```go
   {&arrow.TimestampType{Unit: arrow.Nanosecond, TimeZone: "US/Pacific"}, nil, 
false, "unsupported arrow type for conversion - timestamp[ns, tz=US/Pacific]"},
   ```
   
   right under the existing `us, tz=US/Pacific` panic row. the loop always 
calls `ArrowTypeToIceberg(tt.dt, false)` (downcastTimestamp=false), so this 
exercises the third sub-case of the new branch — the panic at line 348 — which 
the two added rows don't cover. small enough to fold into this PR, wdyt?



##########
table/arrow_utils.go:
##########
@@ -342,10 +342,19 @@ func (c convertToIceberg) Primitive(dt arrow.DataType) 
(result iceberg.NestedFie
                        panic(fmt.Errorf("%w: unsupported arrow type for 
conversion - %s", iceberg.ErrInvalidSchema, dt))
                }
        case *arrow.TimestampType:
-               if dt.Unit == arrow.Nanosecond {
-                       if !c.downcastTimestamp {
-                               panic(fmt.Errorf("%w: 'ns' timestamp precision 
not supported", iceberg.ErrType))
+               if dt.Unit == arrow.Nanosecond && !c.downcastTimestamp {
+                       if slices.Contains(utcAliases, dt.TimeZone) {
+                               result.Type = 
iceberg.PrimitiveTypes.TimestampTzNs
+                       } else if dt.TimeZone == "" {
+                               result.Type = iceberg.PrimitiveTypes.TimestampNs
+                       } else {

Review Comment:
   worth a line in the PR body that UTC-tz and empty-tz nanoseconds with 
`downcastTimestamp=false` used to panic (`'ns' timestamp precision not 
supported`) and now map to `TimestampTzNs` / `TimestampNs`. correct behavior 
for v3, but anyone bisecting #695 down the line will appreciate the explicit 
note that this is a behavior change for v1/v2 callers — they'll now produce a 
v3 schema and trip on `minFormatVersionForType` instead of failing at 
conversion. small enough to fold in here.



##########
table/arrow_utils.go:
##########
@@ -342,10 +342,19 @@ func (c convertToIceberg) Primitive(dt arrow.DataType) 
(result iceberg.NestedFie
                        panic(fmt.Errorf("%w: unsupported arrow type for 
conversion - %s", iceberg.ErrInvalidSchema, dt))
                }
        case *arrow.TimestampType:
-               if dt.Unit == arrow.Nanosecond {
-                       if !c.downcastTimestamp {
-                               panic(fmt.Errorf("%w: 'ns' timestamp precision 
not supported", iceberg.ErrType))
+               if dt.Unit == arrow.Nanosecond && !c.downcastTimestamp {
+                       if slices.Contains(utcAliases, dt.TimeZone) {
+                               result.Type = 
iceberg.PrimitiveTypes.TimestampTzNs
+                       } else if dt.TimeZone == "" {
+                               result.Type = iceberg.PrimitiveTypes.TimestampNs
+                       } else {
+                               panic(fmt.Errorf("%w: unsupported arrow type 
for conversion - %s", iceberg.ErrInvalidSchema, dt))
                        }
+
+                       return result

Review Comment:
   tiny readability nit while we're here — `if dt.Unit == arrow.Nanosecond` 
shows up twice in a row with an early `return` in between. could fold into one 
block to keep the two ns paths adjacent:
   
   ```go
   if dt.Unit == arrow.Nanosecond {
       if !c.downcastTimestamp {
           switch {
           case slices.Contains(utcAliases, dt.TimeZone):
               result.Type = iceberg.PrimitiveTypes.TimestampTzNs
           case dt.TimeZone == "":
               result.Type = iceberg.PrimitiveTypes.TimestampNs
           default:
               panic(fmt.Errorf("%w: unsupported arrow type for conversion - 
%s", iceberg.ErrInvalidSchema, dt))
           }
           return result
       }
       slog.Warn("downcasting nanosecond timestamp to microsecond, precision 
loss may occur")
   }
   ```
   
   equivalent semantics, take it or leave it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to