Well, this has been an adventure, but I think I'm finally on the road to 
success. Loading the corrected data into a temp table, which includes a lot 
of accented words, hasn't been easy. UTF8 fails on the first cedilla found, 
e.g. façade. Using "set client_encoding to 'LATIN1';" gets the data into 
the temp table, but it can't be viewed.  The error is 'utf-8' codec can't 
decode byte 0xe7 in position 97: invalid continuation byte.

For other Linux users who might run into the same problem, I found a 
reference to iconv and found success with
iconv -f latin1 -t utf-8 original_file > new_file

I now have a table with accented words that I hope will enable me to update 
the records with corrections.

Thanks,
Martha



Thanks,
Martha

 

On Tuesday, October 8, 2019 at 5:20:29 PM UTC-7, Martha S wrote:
>
> This is great, Bryan, thank you.
>
> Once we repair the garbage that resulted during import, this might enable 
> us to have the preferred spellings in our database without issue. 
>
> I shall report back once we've tested. I kept wondering how non-English 
> installations manage.
>
> Martha
>
>
>
> On Tuesday, October 8, 2019 at 4:08:01 PM UTC-7, Bryan Alvey wrote:
>>
>> Hi Martha - 
>>
>> I'm not sure this will is what you are after, but if you are having 
>> problems importiing/exporting non-ascii characters to/from Arches, this may 
>> help. We uploaded a cyrillic data set by CSV successfully using this.
>>
>> https://github.com/archesproject/arches/issues/2831
>>
>>
>> For those who have this issue on Apache2, here is what solved my problem:
>>
>>
>> (
>> https://code.djangoproject.com/wiki/django_apache_and_mod_wsgi#AdditionalTweaking
>> )
>>
>>
>> If you're taking advantage of the great Internationalization features of 
>> Django you may come across a curious problem. Namely, uploading of 
>> non-ascii filenames with the Django storage system with the default apache 
>> settings on most systems will trigger UnicodeEncodeError exceptions when 
>> calling functions like os.path(). To avoid these issues, ensure that the 
>> following lines are included in your apache envvars file (typically found 
>> in /etc/apache2/envvars).
>>
>>
>> export LANG='en_US.UTF-8'
>> export LC_ALL='en_US.UTF-8'
>>
>>
>> This error likely wont rear its head during development on the test 
>> server as, when run from the command line, the ./manage.py script inherits 
>> the users language and locale settings.'
>>
>>
>>
>>
>> Not sure this is what you want, but I thought it may help.
>>
>>
>> Bryan
>>
>>
>>
>> On Saturday, 28 September 2019 01:13:59 UTC+1, Martha S wrote:
>>>
>>> I am trying to export all the data for a particular resource model to 
>>> CSV for review and modification and ran into an error during the process -- 
>>> UnicodeEncodeError: 
>>> 'ascii' codec can't encode character u'\xa6' in position 51: ordinal not in 
>>> range(128)
>>>  
>>> *My command*
>>> python manage.py packages -o export_business_data -d 
>>> '/hpladata/Projects/Downloads/Historic District Mapping Files' -f 'csv' 
>>> -c '/hpladata/Projects/Downloads/Historic District Mapping 
>>> Files/Historic District.mapping' 
>>>
>>> *Here's the full error dump*
>>> operation: export_business_data
>>> Traceback (most recent call last):
>>>   File "manage.py", line 29, in <module>
>>>     execute_from_command_line(sys.argv)
>>>   File 
>>> "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py",
>>>  
>>> line 364, in execute_from_command_line
>>>     utility.execute()
>>>   File 
>>> "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py",
>>>  
>>> line 356, in execute
>>>     self.fetch_command(subcommand).run_from_argv(self.argv)
>>>   File 
>>> "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", 
>>> line 283, in run_from_argv
>>>     self.execute(*args, **cmd_options)
>>>   File 
>>> "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", 
>>> line 330, in execute
>>>     output = self.handle(*args, **options)
>>>   File "/Projects/prod/arches/arches/management/commands/packages.py", 
>>> line 190, in handle
>>>     self.export_business_data(options['dest_dir'], options['format'], 
>>> options['config_file'], options['graphs'], options['single_file'])
>>>   File "/Projects/prod/arches/arches/management/commands/packages.py", 
>>> line 770, in export_business_data
>>>     data = resource_exporter.export(graph_id=graph, 
>>> resourceinstanceids=None)
>>>   File 
>>> "/Projects/prod/arches/arches/app/utils/data_management/resources/exporter.py",
>>>  
>>> line 37, in export
>>>     resources = self.writer.write_resources(graph_id=graph_id, 
>>> resourceinstanceids=resourceinstanceids)
>>>   File 
>>> "/Projects/prod/arches/arches/app/utils/data_management/resources/formats/csvfile.py",
>>>  
>>> line 194, in write_resources
>>>     csvs_for_export = csvs_for_export + self.write_resource_relations(
>>> file_name=self.file_name)
>>>   File 
>>> "/Projects/prod/arches/arches/app/utils/data_management/resources/formats/csvfile.py",
>>>  
>>> line 215, in write_resource_relations
>>>     csvwriter.writerow({k:str(v) for k,v in relation.items()})
>>>   File 
>>> "/Projects/prod/arches/arches/app/utils/data_management/resources/formats/csvfile.py",
>>>  
>>> line 215, in <dictcomp>
>>>     csvwriter.writerow({k:str(v) for k,v in relation.items()})
>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in 
>>> position 51: ordinal not in range(128)
>>>
>>> Any suggestions?
>>>
>>> Thanks,
>>> Martha
>>>
>>

-- 
-- To post, send email to [email protected]. To unsubscribe, send 
email to [email protected]. For more information, 
visit https://groups.google.com/d/forum/archesproject?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Arches Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/archesproject/c3733e58-29ed-4778-8b59-633ed7339cb0%40googlegroups.com.

Reply via email to