Hi Jack, Looking at webrev2 your hg comment is repeated, please only fix the bug once :) Comments in-line, but it looks like we've hashed things nearly to completion. I'm happy the user experience should be nicely improved here for 0906.
Thank you, Clay On Mon, 6 Apr 2009, Jack Schwartz wrote: > On 04/05/09 22:51, Clay Baenziger wrote: >> Hi Jack, >> Comments in-line. >> Thank you, >> Clay >> >> On Sat, 4 Apr 2009, Jack Schwartz wrote: >> >>> >>> Hi Clay. >>> >>> Thanks for your in-depth review, and for your comments and suggestions. >>> My responses are inline. >>> >>> Delta webrev is at: >>> http://cr.opensolaris.org/~schwartz/090402.1/webrev.incr_1_2/ >>> New webrev is at: http://cr.opensolaris.org/~schwartz/090402.1/webrev/ >>> >>> Clay Baenziger wrote: >>>> Hi Jack, >>>> Thank you very much for doing this work! And thank you too for all >>>> the cool ideas as we've talked back and forth on these changes. My ideas >>>> are below: >>>> >>>> publish-manifest: >>>> *Instead of using 9999... and 0 for infinity and a small value, >>>> talking with Drew Fisher here in BRM, we should be using: >>>> sys.maxint and -sys.maxint-1. I was originally not >>>> sure if this worked on Windows but it'll work on the Unix world >>>> and Windows. So, the hack should go away. (lines 204 & 205 could >>>> go away then) >>> I have a few comments about this. >>> >>> 1) Since we convert back and forth between integers and digit strings,I >>> don't think sys.maxint will be big enough. >>> import sys >>>>>> print sys.maxint >>> 2147483647 >>> >>>>>> a=2147483647+1 >>>>>> print a >>> 2147483648 >>> >>> For example if we store an IP address of 255.255.255.255 (I know this is >>> unusual, but it is possible...) then the number from the digit string >>> without colons is way larger than maxint. >>> >>> 2) The maximum integral value is greater than maxint, and we could end up >>> with a stored value > maxint. >>> >>> The python reference at www.python.org says that even a long is >>> "implemented using long in C, which gives them at least 32 bits of >>> precision". To me that sounds like some platforms may not support more >>> than 32 bits. That said, comparing something with "999......999," which >>> we are already doing, is ultimately going to go over 32 bits too... >>> >>> I propose we replace INFINITY="999999....999" with >>> >>> INFINITY=str(0xFFFFFFFFFFFFFFFF) >>> >>> which is the digit string "18446744073709551615" or a 64 bit long >>> >>> which is more reasonable, and is still more than large enough to cover >>> 12-digit hex numbers. It is still not perfect though, since python lets >>> you do the following: >>> >>>>>> a=str(0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF) >>>>>> >>>>>> print a >>> 115792089237316195423570985008687907853269984665640564039457584007913129639935 >>> >>> >>> The real solution, which should be documented in 7876, is to store digit >>> strings in the database instead of integers, as we talked about Thursday >>> in our video call. This not only handles the current state of affairs, >>> but plans ahead for if/when we implement storage of IPV6 addresses (with >>> 24 (?) hex digits). I will be sure to update that bug report with this >>> information. >> >> I agree sys.maxint was not the right answer. It looks like Python 2.6 >> introduced a sys.float_info.max (http://docs.python.org/library/sys.html) >> but we're still using 2.4.x. How about INFINITY=long(0xFFFFFFFFFFFFFFFF), >> as I'm not sure why we'd want the str object for numeric comparisons? > The original code had a string of 9's: "9999999...." to represent infinity > and "0" (as opposed to 0) to represent the lower bound. What I've done keeps > the same data types for bounds as were present before. Okay, it's goofy (I should have used a number format not a string) I believe Python deals appropriately. If you've checked ranges (i.e. a manifest in the database has range a-b and you try to add a manifest with range a+1 - b-1 and it errors then I'm okay). >> >>>> *line 183 comment nit: "check [that] ranges are valid" >>> OK. >>> >>> While doing this, I noticed comment references of "-inf". I changed these >>> references to 0, since that's what the code used before I started working >>> on this bug, and what it still uses. If you think it should be "-inf" >>> instead then I'll file a bug to fix this later. Please let me know. >> >> I think the comments(?) which had -inf in them are more accurate. As a >> negative number would still be okay. (Though we don't have a criteria at >> the moment where a negative number would make sense -- and I can't come up >> with such a case -- but the code currently works that way.) > I respectfully disagree. The code has "0" for a lower bound (see > publish-manifest.py line 205 in slim_source or line 207 in my changed > version), which excludes negative numbers, and the comments reflect this. If > we change the code, then we can change the comments. I'd like to leave > things as they are. > > For now, though, I suggest we keep this code as it is since, as you say, > there are no criteria where negative numbers make sense. I'll file a bug to > document that negative-valued criteria should be accommodated. I don't want > to implement that now though, as it is too risky and too late. Okay, this isn't a regression, however, the database uses a NULL which would work for a negative value (so range detection may be broken for negative values). >>>> *In many of the checks for unbounded since forcing case is painful >>>> to the user, could you do a value.lower() or >>>> manifestCriteria.lower() before doing the compare? >>> All "unbounded"s are already stored as lower case. The conversion is done >>> in verifyXML.py, here at lines 250-254: >>> >>> # Handle "unbounded" keyword; and pass >>> lowercase >>> lowered_value = one_value.lower() >>> if (lowered_value == "unbounded"): >>> new_values += lowered_value >> >> Ah, okay, I had seen that and not thought on it much further. Perhaps a >> comment referring to where the case processing is done would be good? > I changed the comment to say that "unbounded"s are already converted to lower > case during manifest processing. > > In general, I shy away from referencing a particular file in the comment of > another file, since that introduces a maintainability issue. If someone > changes the former file, they probably wouldn't know to change the comment in > the latter file, making the comment incorrect and confusing. Yes, I just feel if someone changes the lower case check in the other file, it'd be good to know that somewhere upstream it was once done if it stops being done. (Perhaps, just that it's expected to come in lower-case.) >>>> >>>> AI_database.py: >>>> *This is perhaps a nit since it's more of a stylistic and reuse >>>> idea for formatValue(), I'd like to see this take a dictionary of >>>> keys and values, so that it can handle all keys and values and >>>> only process the few it needs opposed to running for every key. >>>> I.e.: >>>> for key in ("mac", "ipv4", "ipv6"...): >>>> for prefix in ('', 'MIN', 'MAX'): >>>> try: >>>> values[prefix+key]... >>>> [...] >>>> continue >>>> except: KeyError >>>> pass >>>> Of course, however you want, as this pseudocode is still kludgy, >>>> but I'd like to see it be more efficient than just one key >>>> per call of the function. (Then, list-manifest.py line 174 could >>>> just have: >>>> criteria=formatValues(criteria.keys(),criteria) and >>>> webserver.py line 124 could just have: >>>> crit=formatValues(crit.keys(),crit), for example.) >>> I tried this and ran into the problem of trying to change an immutable >>> value*. This means I'd have to create a new dictionary with new values. >>> This can be done, but I'm thinking it is probably more efficient to leave >>> the code as is, since there aren't that many criteria to check, there are >>> not that many formatValue calls to make. A few calls to formatValue seems >>> more efficient than recreating a dictionary. >>> >>> * dict[key]=newValue gave >>> TypeError: object does not support item assignment >>> and >>> trying to do del dict[key] before reassignment, gave >>> AttributeError: 'pysqlite2.dbapi2.Row' object has no attribute 'iteritems' >>> when I tried to reassign >> >> Ah, yes, they're all the PySQLite2 objects in that dictionary, so you'd >> have to delete the key and re-create it. I think you're right, let's move >> on. > Thanks. >> >>>> *Line 530: Elsewhere MIN or MAC are stripped with >>>> var.replace('MIN', '', 1) though I like the key = key[3:] perhaps >>>> consistency is better? (Feel free to ignore this if you think you >>>> should, since you have more experience than I do in maintaining big >>>> code bases.) >>> OK. >>>> *Line 533: This padding will produce a padding like 0001722024199 >>>> where as the padding necessary is by octet 172020024199 (for an >>>> address of 172.20.24.199). Also, I'm not sure why you're padding >>>> values here, since the string is padded in the database already. >>> Oops. Thanks. Removed the padding from MAC addresses. >> >> I forgot to point out the same for line 537 (latest line number) doing the >> same with IPv4. > IPv4 addresses are stored as numbers (vs MAC addresses as strings) and so the > extra processing to handle left-zero-padding is needed. For example, an IP > address of 001.002.003.004 will get stored as the long value of 1002003004. > This is why I have 537 using %12.12d as a format specifier: to add back the > leading two zeros during stringification. (Gotta love English :) ) Then I > have a 12 digit string I can add dots to in the correct places. Ah, I'm happy you're on top of things here, I hadn't thought about that even. >> >>>> >>>> Criteria_schema.rng: >>>> *Thank you for setting the style to a more consistent one with how >>>> other schemas have developed in the gate >>>> *Line 55,58 Nit: I find unless a block is really big, making it a >>>> name reference detracts from the schema readability from all the >>>> jumping around. Do you have a tool to expand or collapse them >>>> while reading? >>> I don't have a tool and I admit that schema readability can be difficult >>> with many small blocks. IMHO, I think it's worse without them though. I >>> also think that taking a more "programmatic" approach to schemas (treating >>> blocks in the schema like functions are in a program) helps break the >>> schema into more manageable, more readable units. The rules I code by >>> apply well to schemas. Some of these are: >>> 1) Lines are 80 chars >>> 2) Nesting levels get indented by one tab and continuations by four spaces >>> 3) When nesting gets so deep that code gets all scrunched up on the right, >>> it's time to revamp how that block of code is written, or else it's time >>> to create a function (which lets me then start on the left again). >>> These rules fit me well for "coding" schemas too, and seem to make them >>> more manageable for me. >>>> >>>> *Lines 54-63: We only support one SC manifest and we need one SC >>>> manifest with an AI manifest, let's take the zeroOrMore tag off >>>> and wrap the AI and SC manifest in an optional tag: >>>> <optional> >>>> <ref name="nm_ai_manifest_src"/> >>>> <ref name="nm_sc_manifest_src"/> >>>> </optional> >>> Done. >>>> >>>> verifyXML.py: >>>> *For __checkIPv4 and __checkMAC, it's too bad these don't return >>>> the same error one would get for a malformed MAC or IPv4 were >>>> they in a range. >>> Yes, I agree. But at least their errors are caught and processed in an >>> understandable way. >>>> If we used LXML to parse the values they could >>>> by doing something like (for IPv4) -- this would also help when >>>> we need to support IPv6 since they're a lot more complex: >>>> # Load in the schema representation of an IPv4 address >>>> s = StringIO('''\ >>>> <element name="a" xmlns="http://relaxng.org/ns/structure/1.0"> >>>> <data type="token"> >>>> <param name="pattern"> >>>> >>>> ((25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9]) >>>> </param> >>>> </data> >>>> ''') >>>> # Parse schema >>>> ipv4_schema=etree.parse(s) >>>> relaxng = lxml.etree.RelaxNG(ipv4_schema) >>>> # from __checkIPv4, values is the IP address in question >>>> data=stringIO(values) >>>> try: >>>> root = lxml.etree.parse(data) >>>> except lxml.etree.XMLSyntaxError, e: >>>> return e.error_log.last_error >>>> [the you'd just need to do the rest of the splitting and padding] >>>> which could be as simple as (in fact this would be preferable >>>> to calling newval += ... even in the current form): >>>> return(''.join([octet.zfill(3) for octet in values.split(".")])) >>> While this may be true, I'd rather skip this for now due to time >>> constraints. If you feel it is important enough, please file a bug. >>>> >>>> *Line 161: Nit: Could you hand in criteriaRoot and the AI database >>>> to make the dependencies of this function more obvious when >>>> called? >>> Done. >>>> *Lines 216-220: The schema should have line 146 changed to: >>>> <data type="string"/> from <text/> and then this is redundant and >>>> handled in XML validation >>> I think you meant line 123 here, the value type of <element name="value">, >>> since the other text/ entries were for manifest names. >>> >>> When I change this, lines 216-220 still fire when I do this in the >>> manifest: >>> >>> <ai_criteria name="ipv4"> >>> <value> >>> 1.2.3.4 5.6.7.8 >>> </value> >>> >>> so lines 216-220 have their use. Leaving them in. >>> >>> I did a fast web search and you are right in that "string" does not do >>> whitespace normalization and "text" does. However, it may be that >>> normalization strips whitespace from the ends of a string, not from the >>> middle, which is why I saw no effect in my test. >> >> I had not thought about strip() splitting on a string with whitespace in >> it. I think we need to support such a string (i.e. if we support output >> from smbios(1) for a machine's manufacturer and model we'd have such values >> as "Sun Microsystems" and "Ultra 40" or "ThinkPad X200". Please remove or >> comment the check so that it can be identifed as allowing strings with no >> white space (please update bug 7955 too). > Hmmm... OK. I tried this and it works. This potentially presents other > issues around missing checks; not sure. There are three items at the moment > which are values, not ranges: arch, cpu and platform. From their > definition, it seems OK that arch and platform take spaces; not sure about > CPU. (Is this number of CPUs, model of CPU, ???) If it represents number of > CPUs (which it probably does), then we don't want spaces. I'll modify to > check only CPU "value"-values. > > Webrev updated: http://cr.opensolaris.org/~schwartz/090402.1/webrev/ > > New delta webrev at: > http://cr.opensolaris.org/~schwartz/090402.1/webrev.incr_2_3/ > > Please bless. > > Thanks, > Jack >> >> Again, the LXML parsing from above could be used for IP and MAC criteria >> and MEM could be validated against a <data type="integer"/> definition, >> etc. >> >>>> *Line 222-224: This is not necessary if a simple check is put in >>>> publish-manifest (original line numbers 116 and 223) to report a >>>> criteria is not a value criteria or range criteria, respectively. >>> Part of the reason for this function is to do validation which is missed >>> by the schema, so the check seems appropriate here. >> >> Okay >> >>>> *Line 225-228: In my testing with a range I can't get more than >>>> two lines to pass the XML validator, is this necessary? >>> Strictly speaking you are correct. I had put this in since I thought I >>> remembered criteria as being extensible and wanted this as a double check >>> for user errors in the criteria manifest. However, thinking more, >>> extending criteria is difficult for the user to do, since there would need >>> to be new database fields added. Since the user cannot do this, I'll >>> remove lines 225-228. >>>> *Line 229-232: This should be handled in the schema, as only a >>>> <range> or <value> tag are allowed? >>> OK. Removed. >>>> >>>> Otherwise, everything looks reasonable. Let me know what you think of >>>> these observations, as there's a lot of ideas here and you have better >>>> experience than me which should be pursued and which are nits. Please >>>> call me with questions too. >>> Thanks again for your review. Thanks also for teaching me some new python >>> tricks, both during the code review and while studying your code during my >>> initial coding process. >>> >>> Thanks, >>> Jack >>>> >>>> Thank you, >>>> Clay >>>> >>>> On Thu, 2 Apr 2009, Jack Schwartz wrote: >>>> >>>>> Hi everyone. >>>>> >>>>> Here is a webrev of my proposed fixes for: >>>>> 4325 Better syntactic treatment of IP and MAC address AI criteria >>>>> >>>>> A medium-level description of the changes is in the bug-report: >>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4325 >>>>> >>>>> Webrev: >>>>> http://cr.opensolaris.org/~schwartz/090402.1/webrev/ >>>>> >>>>> Testing done: >>>>> >>>>> Ran publish manifest and verified with webserver: >>>>> - MAC: min/unbounded, IP: unbounded/max, network: min/unbounded, >>>>> mem: single <value> for range, minMAC has single digit btwn colons >>>>> - range with three values: correctly failed validation >>>>> - range with two unbounded values: correctly failed validation >>>>> - range with 0 minimum: correctly passed validation >>>>> - single IP addr given as a <range>: corectly failed validation >>>>> - pair of values given as a <value>: correctly failed validation >>>>> - 2 values for non range criterion, surrounded by <value>: correctly >>>>> failed validation >>>>> - single value for non range criterion, surrounded by <value>: passed >>>>> validation >>>>> - 2 values for non range criterion, surrounded by <range>: correctly >>>>> failed validation >>>>> - Try a combination with a non-numeric value (arch=sparc): worked >>>>> Also verified list-manifests displayed data correctly. >>>>> >>>>> I'm requesting that Clay review, as he's most familiar with the code >>>>> I've changed and he has a heads-up that it's coming. Others are invited >>>>> to review as well. >>>>> >>>>> Tomorrow I'll verify that an install starts as it should, but I have >>>>> confidence it will work as the webserver output looks fine. >>>>> >>>>> Sorry to rush. I'm requesting a review ASAP, by COB tomorrow if >>>>> possible. I'm supposed to be done on Friday, but to give less than 1 day >>>>> for this review is not really realistic. >>>>> >>>>> Thanks, >>>>> Jack >>>>> >>> >>> >>> >> >> Thank you again for hearing my suggestions on your work. Thank you also for >> all of the nifty ideas which your work has brought up. >> >> -Clay > >