RussellSpitzer edited a comment on issue #1735:
URL: https://github.com/apache/iceberg/issues/1735#issuecomment-723373179


   From what I can tell this actually shouldn't be working on 0.9.1 but works 
because of a fluke.
   
   When running through prune Columns we check whether or not our field posses 
a selected ID
   
   ```java
         Schema fieldSchema = fields.get(field.pos());
         // All primitives are selected by selecting the field, but map and list
         // types can be selected by projecting the keys, values, or elements.
         // This creates two conditions where the field should be selected: if 
the
         // id is selected or if the result of the field is non-null. The only
         // case where the converted field is non-null is when a map or list is
         // selected by lower IDs.
         if (selectedIds.contains(fieldId)) {
           filteredFields.add(copyField(field, field.schema(), fieldId));
         } else if (fieldSchema != null) {
           hasChange = true;
           filteredFields.add(copyField(field, fieldSchema, fieldId));
         }
   ```
   
   In the 0.9.1 table this correctly also passes through selected field ID's of 
0,1,3 
   
   We still get the field.pos() on data_file of 2 so it doesn't match. BUT 
   fields.get(2) returns the Partition schema in "r2" `partition type:Record 
pos:0`
   
   Now we aren't actually looking for that field, we are looking for the 
data_file field. BUT if "filedSchema != null" we follow the secondary pruning 
pathway above. This means we get to add in a copy where we set 
   "data_file to have schema RECORD and the fieldId" we expect. Luckily we are 
pruning this out at the spark level but we are reading the wrong column data 
here.
   
   The 0.8.0 table doesn't have the "r2" record placed in "fields" and instead 
just get's null
   leading to the error at the beginning of this ticket.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to