gustavoatt opened a new issue, #6932:
URL: https://github.com/apache/iceberg/issues/6932

   ### Feature Request / Improvement
   
   ## Current behavior
   
   Currently any writes done through Spark SQL always write to the [default 
partition spec 
ID](https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java#L632).
 We could use a different spec ID but we would need to use Iceberg's lower 
level APIs instead of a direct Spark SQL write.
   
   ## Use case context
   
   We have a use-case internally where we create hable a table `table` with two 
different partition specs:
   
   * `(ds, hr)`: this is the default partition spec which is used to write data 
in an hourly cadence.
   * `(ds)`: this spec is used at the end of the day when compacting all 24 
hours of data into a single partition. We do this both for efficiency of 
compacting these files and for keeping track of when fully daily partitions 
have landed.
   
   This works well for us except when our GDPR job rewrites a whole day of 
data, but unfortunately the rewrite writes using the default spec, i.e. `(ds, 
hr)` which ends up creating more files than needed.
   
   ## Proposed feature
   
   I would like to propose a feature that would let us override the default 
write spec ID when using Spark by passing a new `SparkWriteOption` called 
`output-spec-id`.
   
   Whenever `output-spec-id` we will write data using that partition spec, 
otherwise we will use the default spec ID as we are currently doing.
   
   Is there any interest to have this feature? I can work on a PR to get this 
enabled.
   
   ### Query engine
   
   Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to