GrigorievNick edited a comment on issue #2903:
URL: https://github.com/apache/iceberg/issues/2903#issuecomment-891647675


   >You can set 'write.storage-object.enabled'=true in the table properties, 
which will append some randomness to the beginning of the path for the data. 
This would still keep the folder hierarchy but would allow you to spread data 
files across multiple partitions.
   
   if every file will have a random prefix, it will satisfy my requirements. 
(As I understand that because of s3 policy, they try to create partition based 
on the first 3 symbols.)
   but unfortunately, I am not able to see any changes when specify 
`'write.storage-object.enabled'=true` .
   @kbendick what iceberg version have it? I even can't find this string in 
the[ 
code](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableProperties.java).
   
   >To be able to have full control over the data layout, you can implement 
your own LocationProvider and then set write.location-provider.impl. Here is a 
test that uses a custom location provider: 
https://github.com/apache/iceberg/blob/90225d6c9413016d611e2ce5eff37db1bc1b4fc5/core/src/test/java/org/apache/iceberg/TestLocationProvider.java
   
   Thank you for your answer, I will try to do it.
   
   >But coalescing the files after the fact is something that you'd need to do 
with some sort of Iceberg API (e.g. maybe read in the files to coalesce, create 
a temporary view of that data, and then use MERGE INTO or INSERT to handle the 
write), so that appropriate metadata is written.
   
   I think I can write my custom rewrite based on current rewrite actions.
   The actions list for split partitions:
   1. Scan statistics and find which partition is too big.
   2. Calculate new ranges.
   3. Taking to attention that files are sorted by column which used to build 
ranges. I can find the specific files which must be split into new partitions.
   4. Split a file into 2.
   5. Take all files from previous partitions, and create two new manifests 
with new data.
   6. mark all files in the previous manifest deleted.
   
   But this is possible only if data is physically stored in one folder, and 
only metadata control partition layout.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to