Hi Lucy,

character encodings are one of those nasty issues in computing that nobody 
likes tackling. If you want a detailed, yet fairly easy to follow analysis on 
why that is, see http://www.joelonsoftware.com/articles/Unicode.html (Cthulhu 
is waiting for you there though...)


Basically, what Arches does is the best thing possible. That way most human 
languages can be integrated in Arches, and all you need to do is make sure your 
data is UTF-8. Unfortunately Excel makes that bloody impossible. I think Excel 
saves that file in the ISO-8859-1  encoding. That encoding just doesn't know 
the characters you're trying to save (ISO-8859-1 only contains 191 characters). 
So, it's not just Arches. I can't read them either. Excel should be telling you 
when saving as CSV that you will lose information), it still wouldn't work 
since your csv file already contains illegal ISO-8859-1 characters.


And it's not just Excel, the whole Windows ecosystem is fundamentelly flawed in 
that regard. I myself run Linux where character encoding is handled correctly 
and UTF-8 is the default. No idea how they do it on a Mac.


So, I think using OpenOffice is your best bet. Or just open the csv file you 
have in Notepad++ (or similar text editor), save the file as UTF-8 and fix the 
problems manually. But then you'd have to do that every time you want to change 
something.


Cheers,

Koen


________________________________
Van: [email protected] <[email protected]> namens 
Lucy FJ <[email protected]>
Verzonden: zondag 24 januari 2016 12:28
Aan: Arches Project
Onderwerp: Re: [Arches] Diacriticals in authority and .Arches files problems

Hi Koen,

Thank you for this information. I did tryout some of the suggestions on Google 
for using Excel to create UTF-8 files, because I like using Excel and know it 
well,  but I have tried some and they are over complicated and produce a CVS 
file in UTF-BOM format which I believe will not work in Arches. It looks like I 
will need to download the Openoffice version as you suggest. Must all files 
loading into Arches be UTF-8 only?

Lucy

On Friday, January 22, 2016 at 4:24:42 PM UTC+2, Koen Van Daele wrote:

Hi Lucy,


as far as I know Excel (all versions) are notoriously bad at handling things 
like character encodings.  This rather old Stackoverflow question seems to 
confirm that:

http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding It 
does offer some workarounds, but none of them are very nice.


I would suggest writing your CSV files with Libreoffice/Openoffice. You should 
be able to install it and it's free. While it's not always an exact replacement 
for Excel, when it comes to character encodings, it just works. By default it 
will save things as UTF-8 (at least under Linux it does) and it will ask you if 
you want to save in a different encoding.


Cheers,

Koen


Op vrijdag 22 januari 2016 15:05:52 UTC+1 schreef Lucy FJ:
Hi Adam and Alexei,

I forgot to add that the diacriticals are in the altnames at rows 132 to 136 
when editing in Excel.

Lucy
----- Original Message -----
From: Adam Cox
To: Lucy Fletcher-Jones
Cc: Alexei Peters ; Arches Project
Sent: Thursday, January 21, 2016 5:36 PM
Subject: Re: [Arches] Diacriticals in authority and .Arches files problems

Hi Lucy, you can check the encoding in Notepad ++.  Open your authority 
document with that program, and click the Encoding menu.  Your file should be 
in "UTF-8" or "UTF-8 without BOM" (depends on the version of Notepad ++ you 
have). The î character should work as far as I know...

On Thu, Jan 21, 2016 at 7:18 AM, 'Lucy Fletcher-Jones' via Arches Project 
<[email protected]> wrote:
Hi Alexei,

Thank you for looking into this. I am glad to hear that Arches should support 
diacriticals.

Here is the error message on loading the 'Ruler' Authority document:

RULER_AUTHORITY_DOCUMENT.csv

ERRORS IN FILE: RULER_AUTHORITY_DOCUMENT.values.csv

ERRORS IN FILE: RULER_AUTHORITY_DOCUMENT.csv

ERROR: Make sure the file is saved with UTF-8 encoding
'utf8' codec can't decode byte 0xea in position 30: invalid continuation byte
Traceback (most recent call last):
  File 
"/opt/projects/ENV/lib/python2.7/site-packages/arches/management/commands/package_utils/authority_files.py",
 line 112, in load_authority_file
    for row in rows:
  File "/opt/projects/ENV/lib/python2.7/site-packages/unicodecsv/py2.py", line 
217, in next
    row = csv.DictReader.next(self)
  File "/usr/local/lib/python2.7/csv.py", line 104, in next
    row = self.reader.next()
  File "/opt/projects/ENV/lib/python2.7/site-packages/unicodecsv/py2.py", line 
128, in next
    for value in row]
  File "/opt/projects/ENV/lib/python2.7/encodings/utf_8_sig.py", line 22, in 
decode
    (output, consumed) = codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xea in position 30: invalid 
continuation byte

ERROR in row 31 (Legacyoid (RULER_UID:30) not found.  Make sure your 
ParentConceptid in the

This caused further errors in the Ruler Values files as can be seen from above.
I do not have a copy of the authority file that caused the error asI have since 
corrected it and changed it in a few places. But the alternative name was

Ptolemaîos Philadelphos

and I believe it was the circumflex above the 'i' that caused the problem. 
Certainly when I removed the circumflex, the file loaded OK.

Thank you,
Lucy


----- Original Message -----
From: Alexei Peters
To: Lucy FJ
Cc: Arches Project
Sent: Wednesday, January 20, 2016 8:24 PM
Subject: Re: [Arches] Diacriticals in authority and .Arches files problems

Hi Lucy,
The .arches file should support diacritics.  I'm actually surprised that the 
authority files don't.  I just tested a local file and I was able to add these 
records:

conceptid,PrefLabel,AltLabels,ParentConceptid,ConceptType,Provider
20000001-0000-0000-0000-000000000000,Portland,,CITY_AUTHORITY_DOCUMENT.csv,Index,GCI
20000002-0000-0000-0000-000000000000,San Francisco,The Bay 
Area,CITY_AUTHORITY_DOCUMENT.csv,Index,GCI
20000003-0000-0000-0000-000000000000,San Jose,San 
José,CITY_AUTHORITY_DOCUMENT.csv,Index,GCI

Notice that the alt label for San Jose, is San José

Can you share the authority file that you're having trouble with?
Cheers,
Alexei


Director of Web Development - Farallon Geographics, Inc. - 971.227.3173

On Wed, Jan 20, 2016 at 12:32 AM, Lucy FJ <[email protected]> wrote:
Hi all,
We have been loading customised authority files and have noticed that Arches 
rejects words with diacriticals (accents etc). This is not a problem for us as 
we were happy to remove them  and if we really want them we can enter then 
through the RDM. But will this problem occur when loading resource data through 
.arches? We need to input place names as alternative names using diacriticals 
and it would be much easier if we can do this via .arches files. We know we can 
input them using the resource data manager but obviously when dealing with 
about 3000 entries,,this is time consuming.
Any ideas?
Lucy

--
-- To post, send email to [email protected]. To unsubscribe, send 
email to [email protected]. For more information, visit 
https://groups.google.com/d/forum/archesproject?hl=en
---
You received this message because you are subscribed to the Google Groups 
"Arches Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.


--
-- To post, send email to [email protected]. To unsubscribe, send 
email to [email protected]. For more information, visit 
https://groups.google.com/d/forum/archesproject?hl=en
---
You received this message because you are subscribed to the Google Groups 
"Arches Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.


--
-- To post, send email to [email protected]. To unsubscribe, send 
email to [email protected]. For more information, 
visit https://groups.google.com/d/forum/archesproject?hl=en
---
You received this message because you are subscribed to the Google Groups 
"Arches Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

-- 
-- To post, send email to [email protected]. To unsubscribe, send 
email to [email protected]. For more information, 
visit https://groups.google.com/d/forum/archesproject?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Arches Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to