Jonathan March writes: > Fernando Perez proposed a NumPy enhancement, an ndarray with named axes, > prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew > Brett, Kilian Koepsell and Stefan van der Walt.
I haven't had a thorough look into it, but this work as well as others listed in the 'NdarrayWithNamedAxes' wiki page are similar in spirit to some numpy extensions I've been developing. You can find the code and some initial documentation at: https://people.gso.ac.upc.edu/vilanova/doc/sciexp2 I was not planning to announce it until around 1.0, as the numpy structures are still crude and lack some operations for dynamically extending the structure both in shape and the number of fields on each record (I have some fixes that still need to be committed), but after seeing some related announcements lately, I think we all might benefit from trying to join ideas and efforts. I'll try to shortly explain with an example the part that is related to numpy (that is, the third frontend that appears on the "User Guide": 'plotter', which currently has documentation that is worse than poor). Suppose you have a set of benchmarks that have been simulated with different simulator parameters, such that you have one result file for each executed combination of the "variables": * benchmark * parameter1 * parameter2 Of course, for each execution you'll also have multiple results (what I call "valuenames"; simply fields in a record array, in fact). NOTE: scripts for such executions can be generated with the first frontend ('launchgen'). Then you can find and extract those results (package 'sciexp2.gather') and organize them into an N-dimensional 'Data' object (package 'sciexp2.data'), where the first dimension has (for example) the combinations of "parameter1-parameter2" values, and the 2nd dimension contains one element for each benchmark (method 'sciexp2.data.Data.reshape'). Now, you can index/slice the structure with integers (as always) _as well as_ with: * strings: simple indexing as well as slicing * "filters": slicing with a stepping These are translated into integers through the "metadata" (benchmark name and/or values of the 2 parameters), stored in 'sciexp2.data.Dimension' objects. For example, to get the numbers of tests where parameter1 is between 10 and 100 and just for benchmarks named 'bench1' and 'bench2': data[::"10 < parameter1 && parameter1 < 100",["bench1", "bench2"]] There is a third package extending matplotlib that I have not uploaded (nor fully developed) that is meant to use the dimension and record metadata in the Data object, such that data can be easily plotted. It extracts labels for axis and legends from metadata, and can "exand" operations. For example: * Plot one figure for each benchmark simply declaring the figure as to be "expanded" through the 'benchmark' variable. * Plot multiple lines/bars/whatever with a single plot command, like "plot such and such for each benchmark", or "plot such and such for each configuration and cluster by benchmark name". More extensive examples can be seen on the following URL, which is from a much older version that wasn't using numpy nor matplotlib, and provided a somewhat functional API (SIZE, CPREFETCH, RPREFETCH and SIMULATOR are execution parameters in these examples; fun starts at line 78): https://projects.gso.ac.upc.edu/projects/sciexp2/repository/revisions/200/entry/progs/sciexp2/tags/0.5/plotter/examples/01-spec-figures.cfg Finally, some things that have been bugging me about numppy are: * My 'Data' object is similar to a 'reacarray', such that record elements (what I call "valuenames"), can be accessed as attributes. But to avoid the cost of a recarray, I use an ndarray with records. This has the unfortunate effect that "valuenames" cannot be accessed as attributes on a record, but only when it really is a 'Data' object. Tried to add some methods to numpy.void from my python code to access record fields as attributes, but of course that's not possible. * I'd like to associate extra information to dtype, instead of manually carrying it around on every operation accessing a record field. Namely: * a description; such that it can be automatically used as axis/legend labels in matplotlib. * unit information; such that units of results can be automatically computed when operating with numpy, and later extracted when plotted with matplotlib. For this, existing packages like 'units' in PyPy could be used. * The ability for operating on records instead of separate record fields, such that i can: b = a[0] + a[1] instead of: b_f1 = a[0]["f1"] + a[1]["f1"] b_f2 = a[0]["f2"] + a[1]["f2"] whenever possible. Comments are welcome. apa! -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion