Hi Tanmay, Are you bothered by the .vec files hidden within the compound files? If yes, I have a snippet that can sum up the .vec files inside and outside compound files. https://gist.github.com/wurui90/28de20d46079108d7ae5ed181ba939d4
On Tue, Oct 29, 2024 at 12:08 PM Tanmay Goel <goeltan...@gmail.com> wrote: > Hi all > > I recently joined the Lucene team at Amazon and this is my first time > working with Lucene so any help will be appreciated. > > One of my first tasks is to *add a metric in production to track the RAM / > disk usage of vector fields*. We want to use this metric to decide when to > scale our deployments. > > One of the ideas to get this data was to split the index files such that we > have separate files for each field and prefix filenames with the > field name. We could then analyze the index files and figure out how many > bytes are used for each field. However, this idea is called out as a bad > practice in Lucene docs ( > > https://github.com/apache/lucene/blob/main/dev-docs/file-formats.md#dont-use-too-many-files > ) > > Is there any other way to find out how many bytes are being used by vector > fields? > > Thanks! > > Tanmay >