That helps a lot! Thank you Szehon for the detailed response!

ggg

On Fri, Jan 7, 2022 at 1:54 PM Szehon Ho <szehon.apa...@gmail.com> wrote:

> Sure, I guessed you were asking about the number of manifest files rather
> than entries.  There's always a tradeoff, some aspects being:
>
>    - More manifest files => better predicate pushdown (skip more manifest
>    files during query), and less chance for concurrency conflict (which is two
>    transaction trying to modify same manifest file, which leads to retry).
>    - Less manifest files => metadata queries (like show partitions) can
>    be faster.
>
> Each of these is a large topic itself that might be too big to go into
> here :)
>
> For us, we find the benefit for more manifest file is not as important as
> making the metadata query fast for our users.  So we have tuned
> commit.manifest.target-size-bytes to be a few times than the default.  We
> try to keep the manifest file count to be tens or hundreds for any table,
> we find if there are thousands, then a 'show partition' query takes a long
> time.
>
> We do need to do periodic RewriteManifest to keep the table in this shape
> (as we have too many commits), and also to use
> 'commit.manifest.min-count-to-merge' and 'commit.manifest-merge.enabled' to
> do the merge on commit to keep the table in this shape.
>
> Hope that helps,
> Szehon
>
> On Fri, Jan 7, 2022 at 1:10 PM g. g. grey <g.g.g...@gmail.com> wrote:
>
>> Hi Szehon,
>>
>> Thanks. My apologies; I was too loose in my wording. I'll try to use the
>> terms from the spec.
>>
>> I was asking about the number of total manifest files, specifically the
>> number of `manifest_file` structs that are found in the manifest-list file.
>>
>> It sounds like the "commit.manifest.target-size-bytes" controls the
>> target size when we merge small manifest files, which is great to know we
>> can configure, as it will clearly have an impact on the number of
>> `manifest_file` structs.
>>
>> Is there a general order-of-magnitude target number of `manifest_file`
>> structs? Presumably that would dictate when one would want to merge
>> manifest files and/or data files.
>>
>> Thanks again!
>> ggg
>>
>>
>> On Fri, Jan 7, 2022 at 11:41 AM Szehon Ho <szehon.apa...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> The manifest entries are one per data file or delete file, so depends
>>> how many data files/delete files your table has.  Number of files is
>>> controlled mostly by the parallelism of the job that writes the table,
>>> though there are Iceberg RewriteDataFile utilities that can compact as well
>>> (as in your link).
>>>
>>> The number of manifest files is another topic, controlled by 
>>> "commit.manifest.target-size-bytes"
>>> (but should not affect the number of total manifest entries).
>>>
>>> Hope that helps,
>>> Szehon
>>>
>>> On Fri, Jan 7, 2022 at 9:39 AM g. g. grey <g.g.g...@gmail.com> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> I am just getting started with Iceberg and I'm trying to build up some
>>>> intuition for how large the metadata will become for large, active tables.
>>>> Specifically, what is the order of magnitude of manifest entries that I
>>>> should reasonably expect in a manifest-list file? Is there a particular
>>>> range that is ideal and aimed for when cleaning up/maintaining a table?
>>>>
>>>> I found the maintenance page <https://iceberg.apache.org/#maintenance/>,
>>>> but I'm hoping to find rules-of-thumb based on peoples' experience with
>>>> using iceberg.
>>>>
>>>> Thanks! If I've missed the info somewhere, a simple pointer would be
>>>> great.
>>>> ggg
>>>>
>>>

Reply via email to