Hi Deepika, that would be a welcome addition - we had an earlier discussion about it; see the thread here: https://markmail.org/message/hq7jvobsnxwp7iat
Please be careful not to copy the code from Elastic as it is not shared under an open license that permits copying On Wed, May 24, 2023 at 3:19 PM Deepika Sharma <deeps.sharma0...@gmail.com> wrote: > > Dear Community > > I am writing to share thoughts on the existing Disk Usage API, I believe > there is an opportunity to improve its functionality and performance > through a reimplementation. > Currently, the best tool we have for this is based on a custom Codec that > separates storage by field; to get the statistics we read an existing index > and write it out using AddIndexes and force-merging, using the custom > codec. This is time-consuming and inefficient and tends not to get done. > What we could do is similar to the functionality in Elasticsearch. The > DiskUsage API <https://github.com/elastic/elasticsearch/pull/74051> > estimates the storage of each field by iterating its structures (i.e., > inverted index, doc-values, stored fields, etc.) and tracking the number of > read-bytes. Since we will enumerate the index, it wouldn't require us to > force-merge all the data through addIndexes, and at the same time it > doesn't invade the codec apis. > > Thank you for your time and consideration. I would greatly appreciate any > input, suggestions, or concerns you might have regarding this proposal and > eagerly look forward to your response. > > Best regards, --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org