I cannot think of good ways to do this. Why is it important to break down
per field as opposed to scaling based on the total volume of vector data?

On Tue, Nov 5, 2024 at 10:58 PM Tanmay Goel <goeltan...@gmail.com> wrote:

> Hi Rui
>
> Thanks for your response and the snippet that you shared is great but not
> exactly what I was looking for. With this snippet we are able to find the
> total size of the .vec files, but I want to see inside the .vec files and
> try to compute a map of vector_field_name to the number of bytes on disk.
>
> Thanks
> Tanmay
>
> On Wed, 30 Oct 2024 at 13:30, Rui Wu <rui...@mongodb.com.invalid> wrote:
>
> > Hi Tanmay,
> >
> > Are you bothered by the .vec files hidden within the compound files? If
> > yes, I have a snippet that can sum up the .vec files inside and outside
> > compound files.
> > https://gist.github.com/wurui90/28de20d46079108d7ae5ed181ba939d4
> >
> > On Tue, Oct 29, 2024 at 12:08 PM Tanmay Goel <goeltan...@gmail.com>
> wrote:
> >
> > > Hi all
> > >
> > > I recently joined the Lucene team at Amazon and this is my first time
> > > working with Lucene so any help will be appreciated.
> > >
> > > One of my first tasks is to *add a metric in production to track the
> RAM
> > /
> > > disk usage of vector fields*. We want to use this metric to decide when
> > to
> > > scale our deployments.
> > >
> > > One of the ideas to get this data was to split the index files such
> that
> > we
> > > have separate files for each field and prefix filenames with the
> > > field name. We could then analyze the index files and figure out how
> many
> > > bytes are used for each field. However, this idea is called out as a
> bad
> > > practice in Lucene docs (
> > >
> > >
> >
> https://github.com/apache/lucene/blob/main/dev-docs/file-formats.md#dont-use-too-many-files
> > > )
> > >
> > > Is there any other way to find out how many bytes are being used by
> > vector
> > > fields?
> > >
> > > Thanks!
> > >
> > > Tanmay
> > >
> >
>


-- 
Adrien

Reply via email to