RussellSpitzer opened a new pull request #2609:
URL: https://github.com/apache/iceberg/pull/2609
A rewrite strategy for data files which aims to reorder data with data files
to optimally lay them out
in relation to a column. For example, if the Sort strategy is used on a set
of files which is ordered
by column x and original has files File A (x: 0 - 50), File B ( x: 10 - 40)
and File C ( x: 30 - 60),
this Strategy will attempt to rewrite those files into File A' (x: 0-20),
File B' (x: 21 - 40),
File C' (x: 41 - 60).
Currently the there is no clustering detection and we will rewrite all files
if {@link SortStrategy#REWRITE_ALL}
is true (default). If this property is disabled any files with the incorrect
sort-order as well as any files
that would be chosen by {@link BinPackStrategy} will be rewrite candidates.
In the future other algorithms for determining files to rewrite will be
provided.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]