On Sat, Apr 18, 2015 at 9:10 AM, Nick Wellnhofer <[email protected]> wrote:
> Lucifers,
>
> Let's discuss the public API of VArray.

Big thing first: I think this class should be renamed.  `VArray` is not
intuitive.

I was going to suggest `Array`.  After some thought, I'm leaning towards
`Vector` a la C++ std::vector, because in Clownfish-flavored C "array" can be
a C array, while "vector" would always a Clownfish Vector.

> * `Push_VArray`

I suggest renaming this method to `Push_All`.

> * Grow

We can probably just remove this method. It's a minor optimization at best.

We use effectively the same algo as Python for amortizing reallocations;
here's someone benchmarking Python's algo:

    http://stackoverflow.com/a/311833

    simple append 0.0102
    pre-allocate  0.0098

    Conclusion. It barely matters.

If `Grow` goes, we should also remove `Get_Capacity`.  It's no use to know the
capacity if you can't manipulate it.

> * Fetch

Looks good.  There's a typo in the docs:

    /** Fetch the element at <code>tick</tick>.

I'd fix it myself but don't want to interfere with patches in the pipeline.

> * Shallow_Copy
> * Clone

Per recent discussions, this should be just Clone with shallow semantics.

> * Sort

I think we should simplify this method to always use the element's comparison
routines.  There is only one place in Lucy were we currently call it supplying
the comparison function.

Additionally, I think we should change the internals to use mergesort instead
of quicksort and provide a "stable sort" semantics guarantee in the API.
Getting unstable sort results with quicksort can result in hard-to-understand
or hard-to-reproduce bugs.  In the worst case of an OOM from the additional
malloc for mergesort (which is unlikely because the elements occupy much more
space than the backing array), the problem will be comparatively easier to
identify and fix.

> Regarding the Gather method: I think VA_Gather_Test_t could need an
> additional "elem" parameter. If we want to make Gather public, we should
> also think about adding other functional style methods like Map. But these
> methods don't map to C very well, so I'd suggest to leave them out for now.

Yes, let's just remove Gather.  It's only used in one place in Lucy, and it's
trivial to replace.

> Regarding out-of-bounds offsets and lengths: This already came up when
> discussing the String API. I think the consensus was that out-of-bounds
> offsets should be clamped to the start or end of the VArray instead of
> throwing an error.

+1

And everything works already that way.

> Using uint32_t instead of size_t for array ticks is OK, but I don't have a
> strong opinion on this.

Per recent discussions and CLOWNFISH-35, this is now size_t.

Marvin Humphrey

Reply via email to