On 11/12/2012 12:39 PM, Petr Viktorin wrote:
On 11/11/2012 11:18 PM, John Dennis wrote:

In the future please do not quote the entire patch in the email. It makes it much too difficult to respond. It took me a long time to remove all the unnecessary text for this reply.

New patch following in a moment, as well as the response to your second email.

Cookie library issues:
I decided to solve the cookie library problems by writing a minimal
cookie library that does what we need and no more than that. It is a
new module in ipapython shared by both client and server and comes
with a new unit test. The module has plenty of documentation, no need
to repeat it here.

This sucks. Designing, implementing and maintaining our own version of
general code is never good. I'd like to discuss with the other
developers so everyone knows what's happening in this patch.

Also, API design is a hard thing to do. I'd prefer if we made a subset
of an existing API (especially if there's any hope of the standard
library being fixed) rather than invented a new one.

Yes it does suck, no one was more dismayed about this than
myself. Here is bug references and expanded justification.

The inability of Cookie.py to parse an Expires attribute was opened
6/10/2008, approximately 4.5 years ago.

http://bugs.python.org/issue3073

To this day it remains open and unfixed. I discovered the bug report
when I also ran afoul of the failed behavior. It was shortly
thereafter I also discovered Cookie.py could not parse the HttpOnly
and Secure boolean flags. There are only 7 defined items in a cookie
and Cookie.py fails to parse 3 of them, close to a 50% failure
rate. The bug has stayed open 4.5 years without a fix making into the
upstream distribution. If we were dependent on getting Cookie.py fixed
we would have to make the core Python packages in RHEL 5, RHEL 6, RHEL
7, Fedora 16, Fedora 17, and Fedora 18 were patched. Tracking and
making sure the fix gets into all those distributions is a signifcant
effort, but at least we own those distributions and have some control
over them. But wait! We also expect the IPA clients to install and
successfully run on other systems as well, Debian, Ubuntu, Solaris,
etc. Who is going to make sure Cookie.py has been patched on all the
releases of all those systems? Frankly I have no confidence the
various bugs in Cookie.py will be taken seriously by upstream after
it's languished 4.5 years without a fix making it into an upstream
release. Nor is it just these 3 parse bugs. I can demonstrate many
other ways Cookie.py fails in both cookie generation and cookie
parsing.

The amount of work needed to assure patched versions of
Cookie.py was available in all the systems we hope to deploy the IPA
client on would be huge. Compare that to spending less than a day
writing a new module and unit test that works and installs with our
client. To my thinking it's a no-brainer which is the better choice
for the team.

But there is another issue with Cookie.py. It's API is horrible, it's
difficult to use and important features for working with cookies are
completely absent. That's just at the API level, it's actual
implementation is frightenly bad. You ask, why don't we at least make
the new module API compatible so we can swap it out later when
upstream finally gets around to fixing the problems. Emulating a
terrible API and then having to extend it into something usable
doesn't make sense to me, nor does it seem a good use of team
resources nor beneficial to the IPA product.

The bug I opened about the other issues in Cookie.py is:

http://bugs.python.org/issue16611

Please add the following paragraph to the cookie.py docstring, so people
know our implementation is incomplete:

It doesn't have every bell and whistle nor can it
handle all the cookie scenarios found in the wild with random servers
and browsers, it does what we need in an RFC manner (but not all RFC
features).

The implementation is complete because it follows the RFC. I was perhaps being too cautious. A lot of the extra code in the other libraries exists to accmodate behavior prior to standardization. Those days are hopefully long behind use. Plus our use of cookies is very constrained, we're only going to connect to a server we control. I'm not really worried at this juncture about historical misbehavior. I'm pretty sure the implementation will catch non-RFC compliant behavior and reject it which is proably the right thing to do.

The only thing I think might be missing is automatically escaping and unescaping reserved characters. But the RFC is actually mum on this topic. It defines a legal character set which the parsing code observes. The libary does not currently validate all values being assigned to a cookie do not contain invalid charaters. It's currently up to the caller to supply valid URL components for the domain and path components which might be subject to escaping with HTML entities. Yes, that would be a good thing to add in the future. Is it essential for getting this in? No I don't think so. We won't assigning anything invalid into to the cookie because we've pre-validated everything prior to invoking this code. I've added a comment to this effect in the docstring.


Request URL issues:

We also had problems in rpc.py whereby information from the request
which is needed when we process the response is not available. Most
important was the requesting URL. It turns out that the way the class
and object relationships are structured it's impossible to get this
information. Someone else must have run into the same issue because
there was a routine called reconstruct_url() which attempted to
recreate the request URL from other available
information. Unfortunately reconstruct_url() was not callable from
inside the response handler. So I decided to store the information in
the thread context and when the request is received extract it from
the thread context. It's perhaps not an ideal solution but we do
similar things elsewhere so at least it's consistent. I removed the
reconstruct_url() function because the exact information is now in the
context and trying to apply heuristics to recreate the url is probably
not robust.

I'm no fan of global state, but I guess there's no other way :(

Neither am I. But this is a classic example of how the core Python libraries are messed up.
+from ipapython.ipa_log_manager import *

I understand star imports are current practice for ipa_log_manager, but
I believe it is time to start changing it. Especially since you're just
using one of the 8 names you're importing.
But if you don't agree, keep it and let's save the argument for another day.

Fixed.

Here and in other places, Git complains about whitespace at end of line.

Fixed.


+    def apply_session_cookie(self, url):

This method does too much. Can you split it into one that updates the
global state (and possibly stores the URL there), and another that
modifies the URL?

Refactored by splitting out some of the generic logic into xmlclient.get_session_cookie_from_persistent_storage() and Cookie.http_return_ok(). This makes the routine a smaller and isolates some common logic. The logic to decide if the cookie should be returned properly belongs in the Cookie class so moving it there made a lot of sense and it should have been done that way originally. The unit test was also updated to test the new functionality. Plus the logic was expanded to closely follow the RFC requirements.

Refactoring the way you suggest does not work because of the logic in create_connection(). I realize it's a bit awkward but I don't want to refactor too much of the existing code in search of a more elegant organization. creation_connection() is critical code and the risk of breaking something in search of a more elegant organization is not justified just prior to a release.

+        # O.K. session_cookie is valid to be returned, stash it away where it 
will will
+        # get included in a HTTP Cookie headed sent to the server.
+        self.log.debug("setting session_cookie into context '%s'", 
session_cookie)
+        setattr(context, 'session_cookie', session_cookie.http_cookie())

I don't understand why setattr is needed here. Why would a simple
      context.session_cookie = session_cookie.http_cookie()
not work?
More of the same below.

This is current practice across the entire code base. I don't know why it's done that way. I see little value in deviating from existing practice, that only creates more confusion without benefit. Open a code clean-up ticket on this issue if you wish, that will assure the idiom consistent everywhere.

-                if session_data and e.errcode == 401:
+                if context.has_key('session_cookie') and e.errcode == 401:

What is has_key? Did you mean to use hasattr?

Good catch! Yes I meant hasattr.


                       # Unauthorized. Remove the session and try again.
-                    delattr(context, 'session_data')
+                    delattr(context, 'session_cookie')

Since you're touching the code, can you switch to `del
context.session_cookie`? (again, ignore this comment if there's some
deeper magic you're working around)
More of the same below.

Again, because this is the current idiom, see above comment about consistency and opening a code clean-up ticket.

+    # Debug flag for class
+    debug = False

We have a fancy logging module that allows us to easily tune loggers for
individual classes. Please use it instead of inventing a mechanism based
on class-wide flags and print statements.

Opps, sorry, that was left over from when I developed the code stand-alone and forgot to delete it. Bye-Bye!


+
+    @classmethod
+    def datetime_to_time(cls, dt):

As far as I can see, this method isn't used anywhere. Why add it?

Because it would not be unreasonable to need to convert a naive datetime object in UTC to a UNIX time value (in fact I needed that in the unit test). I discovered it's not obvious how to do this correctly. I scratched my head pouring over the Python doc for the datetime and time modules and couldn't figure it out. A Google search revealed I was not alone in having trouble figuring this out. So rather than having others get it wrong or waste more time figuring it out I thought it reasonable to provide the utility.

+    @classmethod
+    def datetime_to_string(cls, dt=None):


Why is this a class method and not a normal function? It doesn't use cls
at all, and there's no value in overriding it in subclasses.
Same for the other classmethods.

Good question, I think my reasoning for doing it this way is valid though. Handling datetime objects with timezones is awkward, it looks like timezone support was added after datetime objects were in use. Instead of having two datetime classes (one local and one with zone support) datetime objects have an internal state which makes them naive or timezone aware. By default you get naive objects. Adding zone support adds a bit of complexity that's not natively part of the module. This is further complicated by the fact you can't mix naive and aware objects (comparisons will raise exceptions, etc.). The best approach is to use the default naive object and follow a convention over the zone they're in (e.g. UTC).

Aside from the time zone conventions there are strict requirements to meet RFC requirements on the string representation of these values when used in the context of a cookie. We need some way to assure the datetime values adhere to the requirements of the class.

That makes it reasonable to "attach" methods observing the usage requirement in the class to the class. Making them stand alone functions does nothing to indicate nor enforce there is a specific context in which these need to be applied, class methods provide that context.

The other uses of class methods are several of the parsing methods. They are class methods because they act as a class factory returning one or more class instances.

+        s = datetime.datetime.strftime(dt, '%a, %d %b %Y %H:%M:%S')
+        return s + ' GMT'

AFAIK `return datetime.datetime.strftime(dt, '%a, %d %b %Y %H:%M:%S
GMT')` would work just as well.

Sure, no problem, changed.

+    @property
+    def expires(self):
+        '''
+        The expiration timestamp (in UTC) as a datetime object for the
+        cookie, or None if not set.
+
+        You may assign a value of None, a datetime object, a numeric
+        UNIX timestamp (seconds since the epoch UTC) or formatted time
+        string (the latter two will be converted to a datetime object.
+        '''
+        return self._expires
+
+    @expires.setter
+    def expires(self, value):
+        if value is None:
+            self._expires = None
+        elif isinstance(value, datetime.datetime):
+            self._expires = value
+        elif isinstance(value, (int, long, float)):
+            self._expires = datetime.datetime.utcfromtimestamp(value)
+        elif isinstance(value, basestring):
+            self._expires = Cookie.parse_datetime(value)
+        else:
+            raise TypeError('value must be datetime, int, long, float, 
basestring or None, not %s' % \
+                            value.__class__.__name__)

Are the automatic conversions necessary? I'd rather call an explicit
set_expires_from_timestamp than have the attribute silently changed,
with semantics depending on the type.

This is the wrong kind of “easy to use”. It should be “easy to read and
understand”, not “less characters to type” (with magic to determine what
you meant).

Automatic conversions are valuable and useful. The value stored in the object *must* be of the type the object expects. However it's not uncommon to have the value be in some other format. The goal is

* Make the API easy to use.

* Assure the API is used correctly.

* Not to obscure the callers logic with conversion logic every place the API is used. It is vastly more important to keep the calling clean and easy to read because it's used far more often than the single place the class is implemented.

It is correct to put this logic in exactly one location, namely the class implementation. If the class implementation has a bit of complexity that's appropriate, far better the complexity is in the class and not widely distributed across all uses of the class where it will make reading the logic harder and be prone to incorrect usage. Automatic type conversion is common practice in other object orientated language for good reason.


+    def set_attr(self, name, value):
+        '''
+        Sets one of the predefined cookie attributes.
+        '''
+        attr_name = Cookie.attrs.get(name.lower(), None)
+        if attr_name is None:
+            raise ValueError("unknown cookie attribute '%s'" % name)
+        setattr(self, attr_name, value)

Please don't include this in the public interface, there's no need for
users to call it.
It would be much better to inline the function in the only place that
calls it.

Good point but I'd rather see it separated in a subroutine rather than cluttering up the other logic. I made it a class private method.


+
+    def __str__(self):
+        result = "%s=%s;" % (self.key, self.value)
+
+        if self.domain is not None:
+            result += " Domain=%s;" % self.domain
+
+        if self.path is not None:
+            result += " Path=%s;" % self.path
+
+        if self.max_age is not None:
+            result += " Max-Age=%s;" % self.max_age
+
+        if self.expires is not None:
+            result += " Expires=%s;" % Cookie.datetime_to_string(self.expires)
+
+        if self.secure:
+            result += " Secure;"
+
+        if self.httponly:
+            result += " HttpOnly;"
+
+        return result[:-1]      # strip trainling semi-colon

Concatenating strings is slow; please consider building a list and then
`return '; '.join(result)`.

O.K., fixed.



--
John Dennis <jden...@redhat.com>

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Reply via email to