Hi Darcy, this may not be the most elegant answer, but with the basic testing I can do now I believe it will work.
Add the following function to the top of the file in question <https://github.com/archesproject/arches/blob/stable/3.x/arches/management/commands/packages.py#L352>, right under the import statements: def encode_utf8(val): try: utf8 = val.encode('utf-8') except: utf8 = str(val).encode('utf-8') return utf8 then, change this line csvwriter.writerow({k: str(v).encode('utf8') for k, v in csv_record.items()}) to csvwriter.writerow({k: encode_utf8(v) for k, v in csv_record.items()}) Let me know you have any trouble with that. It will probably be easiest to just modify the file in your virtual environment. Adam On Mon, Sep 18, 2017 at 8:04 PM, Darcy Christ <[email protected]> wrote: > Hi Adam, > > Still not sure how to get past this. I see it is an n-dash (I should have > looked this up, rather than assumed it was related to Chinese). > > The question for me is why is this code assuming all ascii since it > allowed an ndash into the database? > > Is there a way to fix this code, rather update the content? > > > Darcy > > > On Saturday, September 16, 2017 at 12:59:44 AM UTC+10, Adam Cox wrote: >> >> Darcy, it looks like this is not a chinese character, but a long dash. >> >> This issue seems to be well-summed up in this stack exchange answer: >> https://stackoverflow.com/a/5387966/3873885 >> >> Essentially, in this line >> >> csvwriter.writerow({k: str(v).encode('utf8') for k, v in >> csv_record.items()}) >> >> the str() operation is encoding v (which in this case is a unicode >> object) to ascii, the default encoding for a str object in python 2.7. >> Then, that ascii-encoded string is further encoded into utf-8. I assume the >> initial str() operation is meant to handle integers and other non-text >> obects, but you've found an example where because u'\u2013' (unicode >> character 2013 >> <http://www.fileformat.info/info/unicode/char/2013/index.htm>) cannot be >> encoded in ascii, it hits an error even before it has a chance to encode to >> utf-8. >> So, I think that line in the code could be improved. >> >> It looks like that line comes from the related resource export >> <https://github.com/archesproject/arches/blob/stable/3.x/arches/management/commands/packages.py#L352> >> process. Maybe you have a long dash in one of the notes that you have about >> a resource to resource relationship? Otherwise, I'm really not sure where >> that problem character would come from... >> >> Hope that's at least a little helpful. >> >> Adam >> >> On Thu, Sep 14, 2017 at 8:01 PM, Darcy Christ <[email protected]> >> wrote: >> >>> I am having trouble exporting data from v3 >>> >>> I have add this to my config: >>> >>> EXPORT_CONFIG = os.path.normpath(os.path.join(PACKAGE_ROOT, >>> 'source_data', 'business_data', 'resource_export_mappings.json')) >>> >>> >>> And then I get an error while trying to export. Given that it is related >>> to encoding, could it be any chinese characters I might have in the data? >>> >>> >>> (hkarches) [hkarches@heritage hongkong]$ python manage.py packages -o >>> export_resources -d '../hongkong_data' >>> operation: export_resources >>> package: hongkong >>> Writing 3 ACTIVITY.E7 resources >>> Writing 370 INFORMATION_RESOURCE.E73 resources >>> Writing 1205 HERITAGE_RESOURCE.E18 resources >>> Writing 545 ACTOR.E39 resources >>> Writing 6 HISTORICAL_EVENT.E5 resources >>> Writing 0 HERITAGE_RESOURCE_GROUP.E27 resources >>> Traceback (most recent call last): >>> File "manage.py", line 28, in <module> >>> execute_from_command_line(sys.argv) >>> File >>> "/home/hkarches/lib/python2.7/site-packages/django/core/management/__init__.py", >>> line 399, in execute_from_command_line >>> utility.execute() >>> File >>> "/home/hkarches/lib/python2.7/site-packages/django/core/management/__init__.py", >>> line 392, in execute >>> self.fetch_command(subcommand).run_from_argv(self.argv) >>> File >>> "/home/hkarches/lib/python2.7/site-packages/django/core/management/base.py", >>> line 242, in run_from_argv >>> self.execute(*args, **options.__dict__) >>> File >>> "/home/hkarches/lib/python2.7/site-packages/django/core/management/base.py", >>> line 285, in execute >>> output = self.handle(*args, **options) >>> File >>> "/home/hkarches/lib/python2.7/site-packages/arches/management/commands/packages.py", >>> line 106, in handle >>> self.export_resources(package_name, options['dest_dir']) >>> File >>> "/home/hkarches/lib/python2.7/site-packages/arches/management/commands/packages.py", >>> line 351, in export_resources >>> csvwriter.writerow({k: str(v).encode('utf8') for k, v in >>> csv_record.items()}) >>> File >>> "/home/hkarches/lib/python2.7/site-packages/arches/management/commands/packages.py", >>> line 351, in <dictcomp> >>> csvwriter.writerow({k: str(v).encode('utf8') for k, v in >>> csv_record.items()}) >>> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in >>> position 220: ordinal not in range(128) >>> >>> -- >>> -- To post, send email to [email protected]. To unsubscribe, >>> send email to [email protected]. For more information, >>> visit https://groups.google.com/d/forum/archesproject?hl=en >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Arches Project" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > -- To post, send email to [email protected]. To unsubscribe, > send email to [email protected]. For more > information, visit https://groups.google.com/d/forum/archesproject?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Arches Project" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- -- To post, send email to [email protected]. To unsubscribe, send email to [email protected]. For more information, visit https://groups.google.com/d/forum/archesproject?hl=en --- You received this message because you are subscribed to the Google Groups "Arches Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
