Hi Tanmay,

Are you bothered by the .vec files hidden within the compound files? If
yes, I have a snippet that can sum up the .vec files inside and outside
compound files.
https://gist.github.com/wurui90/28de20d46079108d7ae5ed181ba939d4

On Tue, Oct 29, 2024 at 12:08 PM Tanmay Goel <goeltan...@gmail.com> wrote:

> Hi all
>
> I recently joined the Lucene team at Amazon and this is my first time
> working with Lucene so any help will be appreciated.
>
> One of my first tasks is to *add a metric in production to track the RAM /
> disk usage of vector fields*. We want to use this metric to decide when to
> scale our deployments.
>
> One of the ideas to get this data was to split the index files such that we
> have separate files for each field and prefix filenames with the
> field name. We could then analyze the index files and figure out how many
> bytes are used for each field. However, this idea is called out as a bad
> practice in Lucene docs (
>
> https://github.com/apache/lucene/blob/main/dev-docs/file-formats.md#dont-use-too-many-files
> )
>
> Is there any other way to find out how many bytes are being used by vector
> fields?
>
> Thanks!
>
> Tanmay
>

Reply via email to