On 24/04/2015 04:15, Marvin Humphrey wrote:
On Sat, Apr 18, 2015 at 9:10 AM, Nick Wellnhofer <[email protected]> wrote:
Lucifers,

Let's discuss the public API of VArray.

Big thing first: I think this class should be renamed.  `VArray` is not
intuitive.

I was going to suggest `Array`.  After some thought, I'm leaning towards
`Vector` a la C++ std::vector, because in Clownfish-flavored C "array" can be
a C array, while "vector" would always a Clownfish Vector.

+1 for `Vector`.

* `Push_VArray`

I suggest renaming this method to `Push_All`.

+1

* Grow

We can probably just remove this method. It's a minor optimization at best.

We use effectively the same algo as Python for amortizing reallocations;
here's someone benchmarking Python's algo:

     http://stackoverflow.com/a/311833

     simple append 0.0102
     pre-allocate  0.0098

     Conclusion. It barely matters.

If `Grow` goes, we should also remove `Get_Capacity`.  It's no use to know the
capacity if you can't manipulate it.

Instead of removing methods, we can always consider to leave them undocumented. From a practical point of view, I like the idea of having methods that are not part of the public API and subject to change or removal, but might be useful to people who "know what they're doing".

Regarding `Grow`, I find the factor of 1.125 (9/8) in `Memory_oversize` extremely conservative. It seems that Python uses the same value and I'm a bit puzzled why there's such a small difference in that benchmark. Maybe other things are at play here (too few runs, too small arrays, using `append` versus indexed assigment).

Personally, I'd go with a factor of 1.5. Here are some values from other implementations:

Java ArrayList   1.5
Java Vector      2
Python list      1.125
Perl array       1.2

Maybe 1.25 is a good compromise.

* Sort

I think we should simplify this method to always use the element's comparison
routines.  There is only one place in Lucy were we currently call it supplying
the comparison function.

+1

Additionally, I think we should change the internals to use mergesort instead
of quicksort and provide a "stable sort" semantics guarantee in the API.
Getting unstable sort results with quicksort can result in hard-to-understand
or hard-to-reproduce bugs.  In the worst case of an OOM from the additional
malloc for mergesort (which is unlikely because the elements occupy much more
space than the backing array), the problem will be comparatively easier to
identify and fix.

+1. We only have to make merge mergesort work

Nick

Reply via email to