Hi Rui

Thanks for your response and the snippet that you shared is great but not
exactly what I was looking for. With this snippet we are able to find the
total size of the .vec files, but I want to see inside the .vec files and
try to compute a map of vector_field_name to the number of bytes on disk.

Thanks
Tanmay

On Wed, 30 Oct 2024 at 13:30, Rui Wu <rui...@mongodb.com.invalid> wrote:

> Hi Tanmay,
>
> Are you bothered by the .vec files hidden within the compound files? If
> yes, I have a snippet that can sum up the .vec files inside and outside
> compound files.
> https://gist.github.com/wurui90/28de20d46079108d7ae5ed181ba939d4
>
> On Tue, Oct 29, 2024 at 12:08 PM Tanmay Goel <goeltan...@gmail.com> wrote:
>
> > Hi all
> >
> > I recently joined the Lucene team at Amazon and this is my first time
> > working with Lucene so any help will be appreciated.
> >
> > One of my first tasks is to *add a metric in production to track the RAM
> /
> > disk usage of vector fields*. We want to use this metric to decide when
> to
> > scale our deployments.
> >
> > One of the ideas to get this data was to split the index files such that
> we
> > have separate files for each field and prefix filenames with the
> > field name. We could then analyze the index files and figure out how many
> > bytes are used for each field. However, this idea is called out as a bad
> > practice in Lucene docs (
> >
> >
> https://github.com/apache/lucene/blob/main/dev-docs/file-formats.md#dont-use-too-many-files
> > )
> >
> > Is there any other way to find out how many bytes are being used by
> vector
> > fields?
> >
> > Thanks!
> >
> > Tanmay
> >
>

Reply via email to