Well, this has been an adventure, but I think I'm finally on the road to success. Loading the corrected data into a temp table, which includes a lot of accented words, hasn't been easy. UTF8 fails on the first cedilla found, e.g. façade. Using "set client_encoding to 'LATIN1';" gets the data into the temp table, but it can't be viewed. The error is 'utf-8' codec can't decode byte 0xe7 in position 97: invalid continuation byte.
For other Linux users who might run into the same problem, I found a reference to iconv and found success with iconv -f latin1 -t utf-8 original_file > new_file I now have a table with accented words that I hope will enable me to update the records with corrections. Thanks, Martha Thanks, Martha On Tuesday, October 8, 2019 at 5:20:29 PM UTC-7, Martha S wrote: > > This is great, Bryan, thank you. > > Once we repair the garbage that resulted during import, this might enable > us to have the preferred spellings in our database without issue. > > I shall report back once we've tested. I kept wondering how non-English > installations manage. > > Martha > > > > On Tuesday, October 8, 2019 at 4:08:01 PM UTC-7, Bryan Alvey wrote: >> >> Hi Martha - >> >> I'm not sure this will is what you are after, but if you are having >> problems importiing/exporting non-ascii characters to/from Arches, this may >> help. We uploaded a cyrillic data set by CSV successfully using this. >> >> https://github.com/archesproject/arches/issues/2831 >> >> >> For those who have this issue on Apache2, here is what solved my problem: >> >> >> ( >> https://code.djangoproject.com/wiki/django_apache_and_mod_wsgi#AdditionalTweaking >> ) >> >> >> If you're taking advantage of the great Internationalization features of >> Django you may come across a curious problem. Namely, uploading of >> non-ascii filenames with the Django storage system with the default apache >> settings on most systems will trigger UnicodeEncodeError exceptions when >> calling functions like os.path(). To avoid these issues, ensure that the >> following lines are included in your apache envvars file (typically found >> in /etc/apache2/envvars). >> >> >> export LANG='en_US.UTF-8' >> export LC_ALL='en_US.UTF-8' >> >> >> This error likely wont rear its head during development on the test >> server as, when run from the command line, the ./manage.py script inherits >> the users language and locale settings.' >> >> >> >> >> Not sure this is what you want, but I thought it may help. >> >> >> Bryan >> >> >> >> On Saturday, 28 September 2019 01:13:59 UTC+1, Martha S wrote: >>> >>> I am trying to export all the data for a particular resource model to >>> CSV for review and modification and ran into an error during the process -- >>> UnicodeEncodeError: >>> 'ascii' codec can't encode character u'\xa6' in position 51: ordinal not in >>> range(128) >>> >>> *My command* >>> python manage.py packages -o export_business_data -d >>> '/hpladata/Projects/Downloads/Historic District Mapping Files' -f 'csv' >>> -c '/hpladata/Projects/Downloads/Historic District Mapping >>> Files/Historic District.mapping' >>> >>> *Here's the full error dump* >>> operation: export_business_data >>> Traceback (most recent call last): >>> File "manage.py", line 29, in <module> >>> execute_from_command_line(sys.argv) >>> File >>> "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", >>> >>> line 364, in execute_from_command_line >>> utility.execute() >>> File >>> "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", >>> >>> line 356, in execute >>> self.fetch_command(subcommand).run_from_argv(self.argv) >>> File >>> "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", >>> line 283, in run_from_argv >>> self.execute(*args, **cmd_options) >>> File >>> "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", >>> line 330, in execute >>> output = self.handle(*args, **options) >>> File "/Projects/prod/arches/arches/management/commands/packages.py", >>> line 190, in handle >>> self.export_business_data(options['dest_dir'], options['format'], >>> options['config_file'], options['graphs'], options['single_file']) >>> File "/Projects/prod/arches/arches/management/commands/packages.py", >>> line 770, in export_business_data >>> data = resource_exporter.export(graph_id=graph, >>> resourceinstanceids=None) >>> File >>> "/Projects/prod/arches/arches/app/utils/data_management/resources/exporter.py", >>> >>> line 37, in export >>> resources = self.writer.write_resources(graph_id=graph_id, >>> resourceinstanceids=resourceinstanceids) >>> File >>> "/Projects/prod/arches/arches/app/utils/data_management/resources/formats/csvfile.py", >>> >>> line 194, in write_resources >>> csvs_for_export = csvs_for_export + self.write_resource_relations( >>> file_name=self.file_name) >>> File >>> "/Projects/prod/arches/arches/app/utils/data_management/resources/formats/csvfile.py", >>> >>> line 215, in write_resource_relations >>> csvwriter.writerow({k:str(v) for k,v in relation.items()}) >>> File >>> "/Projects/prod/arches/arches/app/utils/data_management/resources/formats/csvfile.py", >>> >>> line 215, in <dictcomp> >>> csvwriter.writerow({k:str(v) for k,v in relation.items()}) >>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in >>> position 51: ordinal not in range(128) >>> >>> Any suggestions? >>> >>> Thanks, >>> Martha >>> >> -- -- To post, send email to [email protected]. To unsubscribe, send email to [email protected]. For more information, visit https://groups.google.com/d/forum/archesproject?hl=en --- You received this message because you are subscribed to the Google Groups "Arches Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/archesproject/c3733e58-29ed-4778-8b59-633ed7339cb0%40googlegroups.com.
