Re: [lucy-dev] Clownfish VArray API

Marvin Humphrey Fri, 24 Apr 2015 20:05:59 -0700

On Fri, Apr 24, 2015 at 4:40 AM, Nick Wellnhofer <[email protected]> wrote:


> Instead of removing methods, we can always consider to leave them
> undocumented. From a practical point of view, I like the idea of having
> methods that are not part of the public API and subject to change or
> removal, but might be useful to people who "know what they're doing".

As a general principle, nothing makes me happier than deleting code.
Simplicity is powerful.

In the specific case of the functionality provided by `Grow` and
`Get_Capacity`, I'm OK with any option: keep, remove, or leave undocumented.

On a related note: `Resize` was omitted from the list of public methods at the
top of this thread, but it's important.  It should be public, and its
semantics should be clarified to indicate that it will grow the backing array
if necessary but never shrink it.

If we keep that the functionality of `Grow`, I think we should consider
renaming it to `Reserve`, a la std::vector.

    C++ std::vector -- `reserve`
    Python list -- no API
    Ruby Array -- no API
    Perl array -- no direct API
    Java Vector -- `ensureCapacity`
    C# ArrayList -- `EnsureCapacity`
    Go slice -- `make` followed by `copy`

Additionally, if we supply `Reserve`, we should also supply a function which
triggers non-destructive shrinking of the allocation.  In C++, this is
`shrink_to_fit`, in Java it's `trimToSize`.  I suggest `Compact`.

    /** Request to shrink the Vector's backing array.
      *
      * Request that the Vector's backing array be reallocated to fit no more
      * than `capacity` elements or the current `size` -- whichever is
      * greater.  The Vector's logical content -- it's `size` and its
      * elements -- will not be affected.
      *
      * @param capacity advisory maximum capacity.
      */
    public void
    Compact(Vector *self, size_t capacity);

This same suite of methods -- `Resize`, `Get_Capacity`, `Reserve`, and
`Compact` -- should also be considered for CharBuf and ByteBuf.

> Regarding `Grow`, I find the factor of 1.125 (9/8) in `Memory_oversize`
> extremely conservative. It seems that Python uses the same value and I'm a
> bit puzzled why there's such a small difference in that benchmark. Maybe
> other things are at play here (too few runs, too small arrays, using
> `append` versus indexed assigment).
>
> Personally, I'd go with a factor of 1.5. Here are some values from other
> implementations:
>
> Java ArrayList   1.5
> Java Vector      2
> Python list      1.125
> Perl array       1.2
>
> Maybe 1.25 is a good compromise.

This topic was discussed extensively back in 2010 on both the Lucene dev list
and the Perl 5 Porters list:

  http://markmail.org/message/x427sku4wrnc3rjf
  http://markmail.org/message/up4cgq322lmtx5tw

I'm OK with 1.25.  The ideal ratio is case-dependent.

Marvin Humphrey

Re: [lucy-dev] Clownfish VArray API

Reply via email to