Hello Thank you for your help but so far I do not have any success. I am reading form a the lines from a file.
Here it is the kind of error I am getting : the current line is :07 ArdFche matches ('07', 'Ard\xe8che\r\n') Traceback (most recent call last): File "E:\instal\django\view_servicealapersonne\votreservice\_initialLoad\loade r_departements.py", line 19, in ? r =Department(department=matches[1].encode('utf-8'),department_number=matche s[0].encode('utf-8'),country="France") File "C:\Python24\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-5: invalid dat a the word that is causing trouble is Ardèche. in order to parse my data I am using the following scripts: =============== My script to parse the data ============= import os,codecs from django.models.announceManager import * os.chdir(os.path.abspath("E:\\instal\\django\\view_servicealapersonne\\votreservice\\_initialLoad")) f = codecs.open("departments.txt",encoding='utf-8') import re regexobj = re.compile("([0-9]+)\s+([\w\s?]+)",re.UNICODE) for l in f.xreadlines(): print "the current line is :"+l try: matches = regexobj.search(l).groups() except: print "this is an empty line" print "matches " +str(matches) r =Department(department=matches[1].encode('utf-8'),department_number=matches[0].encode('utf-8'),country="France") r.save() print "ok" print r f.close() ================= my data ================= 01 Ain 02 Aisne 03 Allier 04 Alpes de Haute Provence 05 Hautes Alpes 06 Alpes Maritimes 07 Ardèche <============ this line crash 08 Ardennes 09 Ariège thank you for your help --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users -~----------~----~----~----~------~----~------~--~---