Hi Rui Thanks for your response and the snippet that you shared is great but not exactly what I was looking for. With this snippet we are able to find the total size of the .vec files, but I want to see inside the .vec files and try to compute a map of vector_field_name to the number of bytes on disk.
Thanks Tanmay On Wed, 30 Oct 2024 at 13:30, Rui Wu <rui...@mongodb.com.invalid> wrote: > Hi Tanmay, > > Are you bothered by the .vec files hidden within the compound files? If > yes, I have a snippet that can sum up the .vec files inside and outside > compound files. > https://gist.github.com/wurui90/28de20d46079108d7ae5ed181ba939d4 > > On Tue, Oct 29, 2024 at 12:08 PM Tanmay Goel <goeltan...@gmail.com> wrote: > > > Hi all > > > > I recently joined the Lucene team at Amazon and this is my first time > > working with Lucene so any help will be appreciated. > > > > One of my first tasks is to *add a metric in production to track the RAM > / > > disk usage of vector fields*. We want to use this metric to decide when > to > > scale our deployments. > > > > One of the ideas to get this data was to split the index files such that > we > > have separate files for each field and prefix filenames with the > > field name. We could then analyze the index files and figure out how many > > bytes are used for each field. However, this idea is called out as a bad > > practice in Lucene docs ( > > > > > https://github.com/apache/lucene/blob/main/dev-docs/file-formats.md#dont-use-too-many-files > > ) > > > > Is there any other way to find out how many bytes are being used by > vector > > fields? > > > > Thanks! > > > > Tanmay > > >