Re: Proposal to Reimplement Disk Usage API - Request for Feedback and Collaboration

Michael Sokolov Fri, 26 May 2023 08:03:11 -0700

Hi Deepika, that would be a welcome addition - we had an earlier
discussion about it; see the thread here:
https://markmail.org/message/hq7jvobsnxwp7iat


Please be careful not to copy the code from Elastic as it is not
shared under an open license that permits copying

On Wed, May 24, 2023 at 3:19 PM Deepika Sharma
<[email protected]> wrote:
>
> Dear Community
>
> I am writing to share thoughts on the existing Disk Usage API, I believe
> there is an opportunity to improve its functionality and performance
> through a reimplementation.
> Currently, the best tool we have for this is based on a custom Codec that
> separates storage by field; to get the statistics we read an existing index
> and write it out using AddIndexes and force-merging, using the custom
> codec. This is time-consuming and inefficient and tends not to get done.
> What we could do is similar to the functionality in Elasticsearch. The
> DiskUsage API <https://github.com/elastic/elasticsearch/pull/74051>
> estimates the storage of each field by iterating its structures (i.e.,
> inverted index, doc-values, stored fields, etc.) and tracking the number of
> read-bytes. Since we will enumerate the index, it wouldn't require us to
> force-merge all the data through addIndexes, and at the same time it
> doesn't invade the codec apis.
>
> Thank you for your time and consideration. I would greatly appreciate any
> input, suggestions, or concerns you might have regarding this proposal and
> eagerly look forward to your response.
>
> Best regards,

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Proposal to Reimplement Disk Usage API - Request for Feedback and Collaboration

Reply via email to