John, Any feedback ? We could discuss that during next OpenJDK workshop, but I would prefer going slowly but surely.
Laurent Le jeu. 29 nov. 2018 à 17:52, Laurent Bourgès <bourges.laur...@gmail.com> a écrit : > Hi John & Brian, > > Thanks for the proposed approach, I agree we are in the design discussion; > adding such API enhancements will take time, I said 13 to not hurry for 12 > (CSR...). > > John, you wrote me a very detailed letter, I will try to be more concise > and focus on Array.sort() API. > > Le jeu. 29 nov. 2018 à 02:12, John Rose <john.r.r...@oracle.com> a écrit : > >> >> > >> > In this renderer, I needed a fast sort algorithm on small almost sorted >> > arrays, but more important, to deal with 2 arrays: x values and their >> > corresponding edge indices (pointer like). I derived my MergeSort class >> to >> > perform 2 swaps (x & edge ptr) as Arrays.sort() or TimSort() did not >> > provide such API. >> >> There are two distinct requirements here: 1. Selecting or tweaking a >> sorting >> algorithms for particular expected patterns in the data set >> (almost-sorted). >> 2. Sorting two arrays in tandem, with one array supplying the sort key, >> and the other serving as a passively permuted payload ("PPP"). >> >> Re 1: Another pattern sometimes expected is range-bounded values, >> which may prompt consideration of bin-sort. Perhaps unique values. >> Permutation vectors have both properties. >> > > Yes, the discussion has 2 points: > - provide more sorting algorithms in the public API > - provide new sort variants handling indices > > For example, Julia has a simple sort API: > https://docs.julialang.org/en/v0.6.2/stdlib/sort/ > It provides several basic sort functions (isort, qsort, msort) but the > general sort() accepts optional custom comparison (less) & transformation > functions. > Moreover, another sortperm() function that returns a permutation vector of > indices, it accepts an optional custom comparison (less). > > > >> >> Re 2: In some cases, it's not two arrays that need the work. Sometimes >> it is one array of keys, plus an index set; this can be reduced to two >> arrays >> one of which is an int-array carrying indexes, but in some proposals you >> see >> such an int-array appearing only as an output, with an implicit input of >> the iota-vector for the other array's indexes. >> >> More deeply, sometimes the index array is concrete, but the array of keys >> is virtualized. Canonical example (for me) is a set of indexes into a >> data >> set such as a file. (These indexes are dense for compression algorithms >> like B-W, or could be sparse for lexical analysis of bulk text in situ.) >> To >> handle this properly, a good old index array is just fine, as long as the >> sort algorithm *also* accepts an int-comparator. >> >> There is no strong need to generalize to three arrays in tandem, or to >> have separate algorithms for treating additional arrays as containing >> secondary keys instead of PPP. This is because given a two-array tandem >> sort with PPP, plus an int-array sort with int-comparator, you can compose >> many other shapes at a modest footprint cost by adding two index vector >> to control the various sorts. >> > > So you propose 2 new variants: > - sort(int[]a, int[] b, low, high) where b values are permuted > - sort(int[]a, low, high, Comparator<int, int>) where b values are permuted > > I proposed the 2 array variants: sort( int[] values, int[] indices, low, > high) as it helps when the user wants both values and indices sorted > (easier use), but this can be reduced to sort(int[] indices, low, high, > comparator) as the user can use the permutation vector (indices) to reorder > values. > > I would prefer adding the more general solution sort(int[] indices, low, > high, comparator) first. > > However, having extracted the values is really helpful for performance > (locality) and simplifies the sorting algorithms but costs more data swaps. > For example in Marlin, my comparator would be: > Comparator<int, int>::compare(i,j) { > return Integer.compareTo(edges[i].x, edges[j].x); > } > However, my edge data are packed in an Unsafe buffer so it is more: > Comparator<int, int>::compare(i,j) { > return Integer.compareTo(Unsafe.getInt(addr0 + i), Unsafe.getInt(addr0 > + j) ); > } > I wonder if such lookups may be costly (random access + overheads) in > contrary to having the values given as an extra int[]. > > If you want, I can prototype later both solutions... > > >> >> Simplified sketch of generalized tandem array sort: >> >> 1. Let L be the common length of the N arrays, A[N][L]. >> 2. Create a new int index array P initialized to iota(L). >> 3. Create an int-comparator C over 0<=i,j<L which compares A[*][i] to >> A[*][j]. >> 4. Sort P using C. Now P is a permutation vector to be applied to each >> A[*]. >> 5. Create a second index array Q such that Q[P[i]] = i for all i. Q is >> P's inverse. >> 6. Tandem-sort each Q[*] (as a PPP) using a fresh copy of Q for each >> array. >> >> There is more than one way to do this; the above sketch is intended to be >> straightforward. The temp. space required is two length-L int arrays. >> > > I like you proposal, but I dislike object allocations so I would prefer > providing result arrays as arguments if possible. > In Marlin renderer, allocations are always done in its RendererContext + > XXXArrayCache (soft ref) to ensure no GC overhead. > > >> >> My point is that the "missing primitives" are something like a. reordering >> a PPP using a different array (either indexes or keys) for steering, and >> b. reordering an index vector (or other array of primitives) using a user >> supplied comparator. >> >> An iota generator would be nice too, as well as a permutation inverter. >> The tandem sort could be restricted to a "permute this array" operation, >> giving an index vector, although I don't think this simplifies the >> implementation, >> and it removes some use cases. But those are minor points. >> > > Could you illustrate that aspect ? > > >> >> > 1. I started discussing about extending the Java Sort API with Vladimir >> > Yaroslavskiy, the author of the famous Java DualPivotQuickSort, as he is >> > currently improving it. >> > >> > His coming DPQS 2018 class provides lot of sort algorithms, I propose to >> > make them public API through a facade method in Arrays class: >> > insertionSort(), heapSort(), mergeSort(), quickSort()... Of course, his >> > current code is dedicated to DPQS, but his algorithm implementation >> could >> > be generalized to propose the Best implementations as a public Java API >> > (stable & unstable sorts) >> > >> > For example in Marlin, I need trivial insertionSort() in the Helpers >> class, >> > my MergeSort could be replaced by another best implementation. >> > Let's not reinvent the wheel… >> >> Indeed, let's not. But remember that the JDK does not claim to be (and >> cannot be) a collection of all the "wheels" we might need. >> > > I quoted Julia API as it provides only 3 basic algorithms and OpenJDK > already has such implementations but not publicly exposed. That was my > first request. Ideally these implementations will be the fastest possible > (optimized).Moreover, isort() could rely on a new intrinsics to be as fast > as possible (native code) ? > > >> >> Here's an extra thought or two on tweakable algorithms: One algorithm >> doesn't >> fit all data sets although it's amazing how far we've gone in optimizing >> the JDK's >> only sort algorithm. The JDK can't support many new algorithms to go >> with sort >> (and, ahem, parallelSort), as insertionSort, mergeSort, etc., etc. Doing >> this would >> amount to having a bunch of partially usable sort routines, more like >> "tryThiseSort" >> "orTryThisIfItDidNotWork", or "dontUseThisSort". Actually documenting >> and testing >> a multi-kinded API for {insertion,merge,parallel,plain}Sort would be a >> nightmare, >> IMO, since the properties of sort algorithms are hard to explain and >> verify. >> > > That was not my proposal, just few sort implementations that have their > own interest and are commonly useful within OpenJDK at least. > > >> >> It would be better if we could define an API for sorting algorithms, that >> a provider >> could set up. Then the various API points in Arrays.*sort(*) could be >> documented >> as sugar for specific standard providers, via the API. The provider API >> could >> include not only first-class entry points for sorting various kinds of >> arrays or >> tandem array pairs, but also provide queries for a user who wanted to be >> smart >> about which provider to choose. >> > > That's a good idea, it looks like julia sort() that accepts an extra > algorithm argument to select the algorithm to use. > > >> >> > 2. New Sort API to deal with indices... >> > >> > For now, there is no mean to obtain the indices sorted on another value, >> > see my introduction: edge indices sorted by their x positions. >> > In data science, this is very useful and general. >> > >> > What about an Arrays API extension: >> > - Arrays.sort(int[] a, int[] b, low, high) or >> > Indices int[] Arrays.sortIndices(int[] a, low high), a[] left untouched >> >> Yes, good. I don't think sortIndices buys much, since the internal >> algorithm will almost certainly create an iota vector and sort it instead >> of a. I'd prefer an iota function such that this composition works >> like sortIndices: >> >> int[] indices = indexRange(low, high); >> sort(indices, (i, j) -> Integer.compare(a[i], a[j])); >> return indices; >> > > I agree, I proposed the extra sortIndices() as an helper method that > factorize your snippet. > > >> >> > - same with preallocated buffers & runs arrays >> >> I don't follow: Is this a request for tandem sorting? Off-heap sorting? >> I think the above primitives are likely to cover. >> > > I mean preallocated buffers to avoid any allocation inside sort(), to > avoid GC overhead if the application already handles its own memory > management (like Marlin) or performs tons of sort() calls. Maybe having a > SortContext class would help and could be reused accross sort() > invocations. Efficiency matters here. > > >> >> (Exception: Sorting more that 2 billion things is desirable, but will >> bust out of Java arrays. That's why I'm not proposing sorting index >> arrays of type long, at present. We need an array-like container >> of more than 2 billion entries before we graduate to long indexes.) >> >> > - alternatively, it could be expressed as Comparator<Primitive> >> > Arrays.sort(int[] a, low, high, Comparator<int, int> cmp) that sorts >> array >> > a using the given comparator. For example, a[] could be edge indices >> sorted >> > according to their edge x position... >> > This comparator is specific as it compares anything (external storage) >> at >> > given indices i,j. >> >> Exactly. Except note the limit of 2 billion. >> > > Agreed, but Java arrays are suffering this limit for so long ... > > >> >> > Finally Marlin renderer would benefit from Value types (array of >> structs), >> > as edge data are currently packed in int[] arrays, but this new Sort API >> > seems to me general & needed enough to be discussed here. >> >> Yes, value types will give better memory locality; today you have to use >> tandem sorts. Also, value types will more naturally work with >> comparators, >> where we have to do something different with int-arrays (IntComparator). >> >> Valhalla is aiming at generic specialization also. That, I believe, is >> the >> best future for sorting algorithms (and other algorithms too). A >> specializable >> template class could contain *one copy* of a full sort algorithm, which >> would >> then be specialized to arrays of a reference, primitive, or value type. >> The >> suggestion of a "sort provider API" would mesh nicely with this: The >> template >> class would implement that API. The index type (currently hardwired as >> "int") >> could be a type parameter, instantly releasing us from the 2-billion >> limit. >> Tweakable algorithm parameters (such as "likely run size" or "maximum >> value") >> could be hardwired into the template (as if by a C #ifdef) simply by >> defining >> them as template parameters (a la C++ non-type template parameters). >> > > Looks promising, I should try EA builds ... if ever I have some time ... > to port Marlin using Value types. > > >> >> A user-requested specialization of a template algorithm class would be a >> strong >> signal to the JVM that some "customized" code needs to be generated. >> Currently >> we have no such signals, leading to performance losses when heroic >> inlining >> fails to fully connect the user code to the sort algorithm code. I call >> this the >> "loop customization problem". >> >> — John >> >> P.S. Loop customization problem: >> >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2012-September/008381.html >> http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-June/001953.html >> Each "loop syndrome" for a sort or BLAS algorithm would be a distinct >> template class instance. >> >> P.P.S. Such a template algorithm class would work naturally with Arrays >> 2.0, >> which will (someday, in my dreams!) define template interfaces for >> extending >> Java arrays in new directions, notably with index types beyond int. >> >> Independently of sort algorithms, I expect an Arrays 2.0 array could be >> truly >> 2-D if its index space were pairs of ints (a value type). It might >> internally store >> its data in some blocked cache-friendly form (a square sub-array in each >> cache >> line, maybe). I wonder if there is some useful generalization of >> linear-ordered >> sorting which applies to 2-D arrays? Clearly there are many interesting >> algorithms >> on multi-D arrays which could be instantiated and tweaked using template >> algorithm >> classes. >> >> I am lost, sorry. Let's have this extra discussion in another topic. > > Best regards, > Laurent >