yyanyy commented on pull request #1975:
URL: https://github.com/apache/iceberg/pull/1975#issuecomment-755053818


   > > Not sure if sort order should be nullable by default or 0 (from 
unsorted_order)
   > 
   > The field should be optional because v1 manifests will not have the order 
field. Iceberg will read the value as null, so I think it makes sense to use 
null. And you're right about not storing it for position deletes.
   > 
   > > Do we want only sort order id, or actual sort order struct?
   > 
   > We want the ID. Sort orders are attached to table metadata, so loading the 
order should be a simple hash map lookup.
   > 
   > > For the next PR, do we assume the table's current sort order id is the 
authoritative place to get sort order information when adding a new file?
   > 
   > No. Engines must specify which sort order was used to write a file 
explicitly. So this needs to be exposed in the DataFile and DeleteFile 
builders. By default, we should write either null or 0 (unordered). Probably 
null.
   
   Thank you for the response! 
   
   > We want the ID. Sort orders are attached to table metadata, so loading the 
order should be a simple hash map lookup.
   
   I guess in order to do that, we may need to add the sort order map in 
`FileScanTask`, as it seems like in readers (e.g. `RowDataReader`) we rely on 
it for reading rows, meanwhile we don't have the table available for metadata 
lookup?
   
   > Engines must specify which sort order was used to write a file explicitly.
   
   (Sorry for the naive question) How does the engine specify the sort order 
when writing files? I guess we will need to decide the sort order when building 
the writer (e.g. add a `sortOrder` parameter in [`SparkWriter` writer 
factory](https://github.com/apache/iceberg/blob/master/spark3/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java#L518)),
 and before that maybe for certain type of writers we use `table.sortOrder()` 
and somehow signal to the engine that the output data need to be written in the 
given sorted order, and for some write modes like fast append we directly input 
unordered/null?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to