emkornfield commented on code in PR #16446:
URL: https://github.com/apache/iceberg/pull/16446#discussion_r3284336698


##########
format/spec.md:
##########
@@ -514,12 +514,16 @@ Partition field IDs must be reused if an existing 
partition spec contains an equ
 | **`truncate[W]`** | Value truncated to width `W` (see below)                 
    | `int`, `long`, `decimal`, `string`, `binary`                              
                                | Source type |
 | **`year`**        | Extract a date or timestamp year, as years from 1970     
    | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`      
                                | `int`       |
 | **`month`**       | Extract a date or timestamp month, as months from 
1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, 
`timestamptz_ns`                                      | `int`       |
-| **`day`**         | Extract a date or timestamp day, as days from 1970-01-01 
    | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`      
                                | `int`       |
+| **`day`**         | Extract a date or timestamp day, as days from 1970-01-01 
    | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`      
                                | `date` [1]  |
 | **`hour`**        | Extract a timestamp hour, as hours from 1970-01-01 
00:00:00  | `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`        
                                      | `int`       |
 | **`void`**        | Always produces `null`                                   
    | Any                                                                       
                                | Source type or `int` |
 
 All transforms must return `null` for a `null` input value.
 
+Notes:
+
+1. The result type for `day` has been documented as both `int` and `date` in 
earlier revisions of this spec. The physical representation has always been a 
4-byte integer counting days from `1970-01-01`, regardless of whether the Avro 
field is annotated with `logicalType: date`. Readers may encounter manifests in 
either form; per the Avro specification, unrecognized logical type annotations 
are ignored, so the bytes on disk are identical.

Review Comment:
   > per the Avro specification, unrecognized logical type annotations are 
ignored, so the bytes on disk are identical.
   
   Small nit, but I don't think this really captures the problem with the 
different types?  I think all Iceberg compatible readers by definition must 
recognize `logicalType: date`. IIUC the issue is that iceberg hasn't defined 
the promotion in either direction between `int` and `date`.  If we are 
deferring to the Avro specification [type 
resolution](https://avro.apache.org/docs/1.11.1/specification/#schema-resolution)
 might be the more applicable section (since it doesn't seem to consider 
logical types, I'm not clear if this is intentional, an oversight. One could 
construe this about only talking about physical type or that the types are not 
equal, and we are defining here an type-promotion or reconciliation specific to 
iceberg.).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to