I don't really have enough context (or, at the moment, time) to do a serious review. It may well be that you are safe. iri_to_uri () looks like the key, since you almost certainly will trip over a value that isn't eligible as a url (clients cut and paste from MS Word and equivalent all the time).
But I would like to refute your thought that you shouldn't worry about encoding and decoding. Everything is encoded. Even UTF-32 is an encoding. You already have a database, and changing can be a pain, but I'd like to lobby you, in future projects, to select UTF-8 for your database encoding. For true ASCII characters (ord(c) < 128), in a world of 8 bit bytes, the UTF-8 encoding requires the same number of bytes as the ASCII enocoding for both transmission and storage (they are the same byte value). In environments where the local accented characters are common, you might make an argument that latin-N will save you space, but it also means that there are characters that you can't represent. The most trouble free situation is when you use unicode strings in python, as Django tries hard to do, and properly configure your interfaces (http, database) to do the appropriate encoding and decoding. Bill On Thu, Jul 22, 2010 at 8:58 AM, Yateen <[email protected]> wrote: > Ok, I did some changes and things look to be working. > > My intention was to receive URLs, parse them to get the base URL, put > them in database (Postgres), and then through a http query, through > Django interface through psycopg2, retrieve these URLs and display > those to the user on the browser in a table. > > I do not know whether I should be really worrying about encoding/ > decoding here as I just wanted to get the chars as they were coming. > Hence, I tried below changes and they were working fine. > > I am giving some samples of the URLs that I processed which, with the > help of below changes, could work fine. > > http://003-sexo-mulheres-nuas.ck7.net > http://live.žšcr.com/host <<some chars before cr can not be copied > here. > http://östrogenfrei.de/verhuetung.html > > > I needed below changes - > > - keep the Postgres client encoding to sql_ascii. > > - Make below changes in Django in following modules - > > ./python2.6.1/lib/python2.6/site-packages/django//contrib/syndication/ > feeds.py > Change for applying Unicode on our URLs and data which is probably > unnecessary. The iri_to_uri is harmless, but works for us. > > 135,136c135 > < url = iri_to_uri(enc_url), > < #url = smart_unicode(enc_url), > --- >> url = smart_unicode(enc_url), > 138,139c137 > < mime_type = > iri_to_uri(self.__get_dynamic_attr('item_enclosure_mime_type', item)) > < #mime_type = > smart_unicode(self.__get_dynamic_attr('item_enclosure_mime_type', > item)) > --- >> mime_type = >> smart_unicode(self.__get_dynamic_attr('item_enclosure_mime_type', item)) > > > ./python2.6.1/lib/python2.6/site-packages/django//db/backends/ > postgresql/base.py > Same philosophy as above > Additionally, using sql_ascii as character set wherever possible. > > 46,47c46 > < #result[smart_str(key, charset)] = smart_str(value, > charset) > < result[smart_str(key, charset)] = iri_to_uri(value) > --- >> result[smart_str(key, charset)] = smart_str(value, charset) > 50,51c49 > < return tuple([iri_to_uri(p) for p in params]) > < #return tuple([smart_str(p, self.charset, True) for p in > params]) > --- >> return tuple([smart_str(p, self.charset, True) for p in params]) > 54,55c52 > < return self.cursor.execute(iri_to_uri(sql), > self.format_params(params)) > < #return self.cursor.execute(smart_str(sql, self.charset), > self.format_params(params)) > --- >> return self.cursor.execute(smart_str(sql, self.charset), >> self.format_params(params)) > 128c125 > < cursor = UnicodeCursorWrapper(cursor, 'sql_ascii') > --- >> cursor = UnicodeCursorWrapper(cursor, 'utf-8') > 137,138c134 > < #return smart_unicode(s) > < return iri_to_uri(s) > --- >> return smart_unicode(s) > > > ./python2.6.1/lib/python2.6/site-packages/django//db/backends/ > postgresql_psycopg2/base.py > Need to disable psycopg2 extensions as Unicode as this is not needed. > We can safely expect whatever data we get from DJango interface to be > of our use. > 25c25 > < #psycopg2.extensions.register_type(psycopg2.extensions.UNICODE) > --- >> psycopg2.extensions.register_type(psycopg2.extensions.UNICODE) > > The below changes looks redundant now, things are working even w/o > this one. > ./python2.6.1/lib/python2.6/site-packages/django//db/models/base.py > Setting the encoding to ascii. > 277,278c277 > < return force_unicode(self).encode('ascii') > < #return force_unicode(self).encode('utf-8') > --- >> return force_unicode(self).encode('utf-8') > > > > For the purpose of processing, my views.py needed to process the URLs > in a slightly different before rendering the response back to html - > import urllib > url = urllib.quote_plus(received_url) > > Also, in the html file, where I was processing the URL, I needed to > 'unescape' my url. > > > Here is my request/query - > > Can you please review these changes and this approach? Do you see any > major issue here? > I am sure there must be some purpose in not having this approach > earlier, but just wondering why? > > Thanks, > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/django-users?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

