PranjalChaitanya opened a new pull request, #779: URL: https://github.com/apache/iceberg-go/pull/779
Part of #589 This PR implements support for write-default values when projecting Arrow batches in `ToRequestedSchema`. If a column is not provided but there is a `WriteDefault` specified in the schema, it will create a column populated with the default value. To generate the column, this PR uses the method `MakeArrayFromScalar()`. This is similar to how [PyIceberg](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L1998-L2009) handles write-defaults. One complication I ran into was converting Iceberg default values into Arrow scalars. In the schema, WriteDefault is stored as any, which means the concrete Go type information is lost at compile time. Iceberg default values are often stored using named Go types such as Date, Time, or Timestamp, which wrap primitive values like int32 or int64. Arrow’s scalar helpers (`MakeScalar`, `MakeScalarParam`) infer the scalar type from the Go value, and since these Iceberg types are not the same as Arrow’s own types (such as `arrow.Date32`), Arrow may interpret them as generic numeric scalars instead of their intended logical types. For example, an iceberg.Date may be interpreted as a generic integer rather than a date32 scalar. While the numeric value would still be correct, the resulting Arrow array would have the wrong logical type. Java and Python implementations don’t run into this issue in the same way. From what I can tell, Java's core writers do not seem to be using Arrow. In Python, pa.scalar(value, type=...) explicitly specifies the Arrow type during scalar construction, so PyArrow does not need to infer the type from the Python value. Because Go stores the default as `any` in the schema, some runtime dispatch is required to normalize the value before constructing the Arrow scalar. The implementation in this PR handles those cases to ensure the resulting Arrow array matches the schema’s logical type. I added tests to test Write-Default across different iceberg types. If there is a simpler or more idiomatic way to perform this conversion within the Arrow or Iceberg-Go codebase, I would be very open to changing the implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
