On Thu, Jul 8, 2010 at 2:41 PM, Rob Speer <rsp...@mit.edu> wrote: > On Thu, Jul 8, 2010 at 2:27 PM, Skipper Seabold <jsseab...@gmail.com> wrote: >> On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer <rsp...@mit.edu> wrote: >>> Your labels are unique if you look at them the right way. Here's how I >>> would represent that in a datarray: >>> * axis0 = 'city', ['Austin', 'Boston', ...] >>> * axis1 = 'month', ['January', 'February', ...] >>> * axis2 = 'year', [1980, 1981, ...] >>> * axis3 = 'region', ['Northeast', 'South', ...] >>> * axis4 = 'measurement', ['precipitation', 'temperature'] >>> >>> and then I'd make a 5-D datarray labeled with [axis0, axis1, axis2, >>> axis3, axis4]. >>> >> >> Yeah, this is what I was thinking I would have to do, but it's still >> not clear to me (I have trouble trying to think in 5 dimensions...). >> For instance, what axis holds my actual numeric data? >> >> axis4, with a "precipitation" tick? > > Yep, that's what I was suggesting. Or you could have two different 4-D > matrices, one whose values are precipitation and one whose values are > temperatures. > >>> Now I realize not everyone wants to represent their tabular data as a >>> big tensor that they index every which way, and I think this is one >>> thing that pandas is for. >> >> This is kind of where I would like the divide to be between user and >> developer. On top of all of this, I would like to see a __repr__ or >> something that actually spits out a 2d spreadsheet-looking >> representation. It would help me stay sane I think. Fernando's nice >> 3D graphic only can go so far as a mental model (for me at least). > > Divisi2 uses a 2D labeled representation as its __str__ -- an example > is at http://csc.media.mit.edu/docs/divisi2/sparse.html > > I could port this onto datarray. I was holding off because I was > unsure about how to represent the N-d case, but I realize now that > showing the entries in this kind of 2-D tabular format could actually > be a really intuitive way to do it. >
+1. When you first showed the printed divisi array for the movie data I definitely had an "aha" moment. >> Mix-ins sounds reasonable to me as long as this could easily be >> accomplished. Ie., why use csr? Can you go between others? Are the >> sparse matrices reasonably stable given recent activity? Not >> rhetorical questions, I don't use sparse matrices much. > > These are good questions. > > I ended up using PySparse instead of scipy.sparse, because SciPy 0.7's > sparse matrices weren't ready to support many important operations, > particularly slicing. SciPy 0.8's sparse matrices look much better, > and I may transition to using them once it's released. > > When planning future features of NumPy, of course, we should assume > SciPy's sparse matrices do what we want (and possibly fix them if they > don't). > > csr_matrix was just an example. I think there would have to be > separate classes for labeled csr_matrices, labeled lil_matrices, and > so on, supporting all the usual methods for converting between them. Ok, sounds good to me. Just wanted to make sure. Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion