On 6/13/12 1:12 PM, Nathaniel Smith wrote: > your-branch's-base-master but not in your-repo's-master are new stuff > that you did on your branch. Solution is just to do > git push<your github remote name> master
Fixed, thanks. > Yes, of course we *could* write the code to implement these "open" > dtypes, and then write the documentation, examples, tutorials, etc. to > help people work around their limitations. Or, we could just implement > np.fromfile properly, which would require no workarounds and take less > code to boot. > > [snip] > So would a proper implementation of np.fromfile that normalized the > level ordering. My understanding of the impetus for the open type was sensitivity to the performance of having to make two passes over large text datasets. We'll have to get more feedback from users here and input from Travis, I think. > categories in their data, I don't know. But all your arguments here > seem to be of the form "hey, it's not *that* bad", and it seems like > there must be some actual affirmative advantages it has over PyDict if > it's going to be worth using. I should have been more specific about the performance concerns. Wes summed them up, though: better space efficiency, and not having to box/unbox native types. >> I think I like "categorical" over "factor" but I am not sure we should >> ditch "enum". There are two different use cases here: I have a pile of >> strings (or scalars) that I want to treat as discrete things >> (categories), and: I have a pile of numbers that I want to give >> convenient or meaningful names to (enums). This latter case was the >> motivation for possibly adding "Natural Naming". > So mention the word "enum" in the documentation, so people looking for > that will find the categorical data support? :-) I'm not sure I follow. Natural Naming seems like a great idea for people that want something like an actual enum (i.e., a way to avoid magic numbers). We could even imagine some nice with-hacks: colors = enum(['red', 'green', 'blue') with colors: foo.fill(red) bar.fill(blue) But natural naming will not work with many category names ("VERY HIGH") if they have spaces, etc. So, we could add a parameter to factor(...) that turns on and off natural naming for a dtype object when it is created: colors = factor(['red', 'green', 'blue'], closed=True, natural_naming=False) vs colors = enum(['red', 'green', 'blue']) I think the latter is better, not only because it is more parsimonious, but because it also expresses intent better. Or we can just not have natural naming at all, if no one wants it. It hasn't been implemented yet, so that would be a snap. :) Hopefully we'll get more feedback from the list. >>> I'm disturbed to see you adding special cases to the core ufunc >>> dispatch machinery for these things. I'm -1 on that. We should clean >>> up the generic ufunc machinery so that it doesn't need special cases >>> to handle adding a simple type like this. >> This could certainly be improved, I agree. > I don't want to be Mr. Grumpypants here, but I do want to make sure > we're speaking the same language: what "-1" means is "I consider this > a show-stopper and will oppose merging any code that does not improve > on this". (Of course you also always have the option of trying to > change my mind. Even Mr. Grumpypants can be swayed by logic!) Well, a few comments. The special case in array_richcompare is due to the lack of string ufuncs. I think it would be great to have string ufuncs, but I also think it is a separate concern and outside the scope of this proposal. The special case in arraydescr_typename_get is for the same reason as datetime special case, the need to access dtype metadata. I don't think you are really concerned about these two, though? That leaves the special case in PyUFunc_SimpleBinaryComparisonTypeResolver. As I said, I chaffed a bit when I put that in. On the other hand, having dtypes with this extent of attached metadata, and potentially dynamic metadata, is unique in NumPy. It was simple and straightforward to add those few lines of code, and does not affect performance. How invasive will the changes to core ufunc machinery be to accommodate a type like this more generally? I took the easy way because I was new to the numpy codebase and did not feel confident mucking with the central ufunc code. However, maybe the dispatch can be accomplished easily with the casting machinery. I am not so sure, I will have to investigate. Of course, I welcome input, suggestions, and proposals on the best way to improve this. >> I'm glad Francesc and Wes are aware of the work, but my point was that >> that isn't enough. So if I were in your position and hoping to get >> this code merged, I'd be trying to figure out how to get them more >> actively on board? Is there some other way besides responding to and attempting to accommodate technical needs? Bryan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion