Hi Lucy,

I just wanted to say that I agree with Koen in that I use OpenOffice to 
manipulate my authority documents, resource graphs and in preparing my 
.arches files and find that it works very well in terms of handling issues 
such as yours. I can't recommend it highly enough after having similar 
problems with Excel.

Best wishes,

Richard



On Monday, January 25, 2016 at 9:39:35 AM UTC, Koen Van Daele wrote:
>
> Hi Lucy,
>
>
> character encodings are one of those nasty issues in computing that nobody 
> likes tackling. If you want a detailed, yet fairly easy to follow analysis 
> on why that is, see http://www.joelonsoftware.com/articles/Unicode.html 
> (Cthulhu 
> is waiting for you there though...)
>
>
> Basically, what Arches does is the best thing possible. That way most 
> human languages can be integrated in Arches, and all you need to do is make 
> sure your data is UTF-8. Unfortunately Excel makes that bloody impossible. 
> I think Excel saves that file in the ISO-8859-1  encoding. That 
> encoding just doesn't know the characters you're trying to save (ISO-8859-1 
> only contains 191 characters). So, it's not just Arches. I can't read them 
> either. Excel should be telling you when saving as CSV that you will lose 
> information), it still wouldn't work since your csv file already contains 
> illegal ISO-8859-1 characters.
>
>
> And it's not just Excel, the whole Windows ecosystem is fundamentelly 
> flawed in that regard. I myself run Linux where character encoding is 
> handled correctly and UTF-8 is the default. No idea how they do it on a Mac.
>
>
> So, I think using OpenOffice is your best bet. Or just open the csv file 
> you have in Notepad++ (or similar text editor), save the file as UTF-8 and 
> fix the problems manually. But then you'd have to do that every time you 
> want to change something.
>
>
> Cheers,
>
> Koen
>
>
> ------------------------------
> *Van:* [email protected] <javascript:> <
> [email protected] <javascript:>> namens Lucy FJ <
> [email protected] <javascript:>>
> *Verzonden:* zondag 24 januari 2016 12:28
> *Aan:* Arches Project
> *Onderwerp:* Re: [Arches] Diacriticals in authority and .Arches files 
> problems 
>  
> Hi Koen, 
>
> Thank you for this information. I did tryout some of the suggestions on 
> Google for using Excel to create UTF-8 files, because I like using Excel 
> and know it well,  but I have tried some and they are over complicated and 
> produce a CVS file in UTF-BOM format which I believe will not work in 
> Arches. It looks like I will need to download the Openoffice version as you 
> suggest. Must all files loading into Arches be UTF-8 only?
>
> Lucy
>
> On Friday, January 22, 2016 at 4:24:42 PM UTC+2, Koen Van Daele wrote: 
>>
>> Hi Lucy,
>>
>>
>> as far as I know Excel (all versions) are notoriously bad at handling 
>> things like character encodings.  This rather old Stackoverflow question 
>> seems to confirm that: 
>>
>> http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding 
>> It 
>> does offer some workarounds, but none of them are very nice.
>>
>>
>> I would suggest writing your CSV files with Libreoffice/Openoffice. You 
>> should be able to install it and it's free. While it's not always an exact 
>> replacement for Excel, when it comes to character encodings, it just works. 
>> By default it will save things as UTF-8 (at least under Linux it does) and 
>> it will ask you if you want to save in a different encoding.
>>
>>
>> Cheers,
>>
>> Koen
>>
>>
>>
>> Op vrijdag 22 januari 2016 15:05:52 UTC+1 schreef Lucy FJ: 
>>>
>>> Hi Adam and Alexei,
>>>  
>>> I forgot to add that the diacriticals are in the altnames at rows 132 to 
>>> 136 when editing in Excel. 
>>>  
>>> Lucy
>>>
>>> ----- Original Message ----- 
>>> *From:* Adam Cox 
>>> *To:* Lucy Fletcher-Jones 
>>> *Cc:* Alexei Peters ; Arches Project 
>>> *Sent:* Thursday, January 21, 2016 5:36 PM
>>> *Subject:* Re: [Arches] Diacriticals in authority and .Arches files 
>>> problems
>>>
>>> Hi Lucy, you can check the encoding in Notepad ++.  Open your authority 
>>> document with that program, and click the Encoding menu.  Your file should 
>>> be in "UTF-8" or "UTF-8 without BOM" (depends on the version of Notepad ++ 
>>> you have). The î character should work as far as I know...
>>>
>>> On Thu, Jan 21, 2016 at 7:18 AM, 'Lucy Fletcher-Jones' via Arches 
>>> Project <[email protected]> wrote:
>>>
>>>> Hi Alexei,
>>>>  
>>>> Thank you for looking into this. I am glad to hear that Arches should 
>>>> support diacriticals. 
>>>>  
>>>> Here is the error message on loading the 'Ruler' Authority document:
>>>>  
>>>> RULER_AUTHORITY_DOCUMENT.csv
>>>>  
>>>> ERRORS IN FILE: RULER_AUTHORITY_DOCUMENT.values.csv
>>>>  
>>>> ERRORS IN FILE: RULER_AUTHORITY_DOCUMENT.csv
>>>>  
>>>> ERROR: Make sure the file is saved with UTF-8 encoding
>>>> 'utf8' codec can't decode byte 0xea in position 30: invalid 
>>>> continuation byte
>>>> Traceback (most recent call last):
>>>>   File 
>>>> "/opt/projects/ENV/lib/python2.7/site-packages/arches/management/commands/package_utils/authority_files.py",
>>>>  
>>>> line 112, in load_authority_file
>>>>     for row in rows:
>>>>   File 
>>>> "/opt/projects/ENV/lib/python2.7/site-packages/unicodecsv/py2.py", line 
>>>> 217, in next
>>>>     row = csv.DictReader.next(self)
>>>>   File "/usr/local/lib/python2.7/csv.py", line 104, in next
>>>>     row = self.reader.next()
>>>>   File 
>>>> "/opt/projects/ENV/lib/python2.7/site-packages/unicodecsv/py2.py", line 
>>>> 128, in next
>>>>     for value in row]
>>>>   File "/opt/projects/ENV/lib/python2.7/encodings/utf_8_sig.py", line 
>>>> 22, in decode
>>>>     (output, consumed) = codecs.utf_8_decode(input, errors, True)
>>>> UnicodeDecodeError: 'utf8' codec can't decode byte 0xea in position 30: 
>>>> invalid continuation byte
>>>>  
>>>> ERROR in row 31 (Legacyoid (RULER_UID:30) not found.  Make sure your 
>>>> ParentConceptid in the  
>>>>  
>>>> This caused further errors in the Ruler Values files as can be seen 
>>>> from above. 
>>>> I do not have a copy of the authority file that caused the error asI 
>>>> have since corrected it and changed it in a few places. But the 
>>>> alternative 
>>>> name was 
>>>>  
>>>> Ptolemaîos Philadelphos
>>>>  
>>>> and I believe it was the circumflex above the 'i' that caused the 
>>>> problem. Certainly when I removed the circumflex, the file loaded OK.
>>>>  
>>>> Thank you, 
>>>> Lucy
>>>>  
>>>>  
>>>> ----- Original Message ----- 
>>>>
>>>> *From:* Alexei Peters 
>>>> *To:* Lucy FJ 
>>>> *Cc:* Arches Project 
>>>> *Sent:* Wednesday, January 20, 2016 8:24 PM
>>>> *Subject:* Re: [Arches] Diacriticals in authority and .Arches files 
>>>> problems
>>>>
>>>> Hi Lucy, 
>>>> The .arches file should support diacritics.  I'm actually surprised 
>>>> that the authority files don't.  I just tested a local file and I was able 
>>>> to add these records:
>>>>
>>>> conceptid,PrefLabel,AltLabels,ParentConceptid,ConceptType,Provider 
>>>>
>>>> 20000001-0000-0000-0000-000000000000,Portland,,CITY_AUTHORITY_DOCUMENT.csv,Index,GCI
>>>> 20000002-0000-0000-0000-000000000000,San Francisco,The Bay 
>>>> Area,CITY_AUTHORITY_DOCUMENT.csv,Index,GCI
>>>> 20000003-0000-0000-0000-000000000000,San Jose,San 
>>>> José,CITY_AUTHORITY_DOCUMENT.csv,Index,GCI
>>>>
>>>> Notice that the alt label for San Jose, is San José
>>>>
>>>> Can you share the authority file that you're having trouble with?
>>>> Cheers,
>>>> Alexei
>>>>
>>>>
>>>> Director of Web Development - Farallon Geographics, Inc. - 971.227.3173
>>>>
>>>> On Wed, Jan 20, 2016 at 12:32 AM, Lucy FJ <[email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>> We have been loading customised authority files and have noticed that 
>>>>> Arches rejects words with diacriticals (accents etc). This is not a 
>>>>> problem 
>>>>> for us as we were happy to remove them  and if we really want them we can 
>>>>> enter then through the RDM. But will this problem occur when loading 
>>>>> resource data through .arches? We need to input place names as 
>>>>> alternative 
>>>>> names using diacriticals and it would be much easier if we can do this 
>>>>> via 
>>>>> .arches files. We know we can input them using the resource data manager 
>>>>> but obviously when dealing with about 3000 entries,,this is time 
>>>>> consuming.
>>>>> Any ideas?
>>>>> Lucy
>>>>>
>>>>> --
>>>>> -- To post, send email to [email protected]. To unsubscribe, 
>>>>> send email to [email protected]. For more information, 
>>>>> visit https://groups.google.com/d/forum/archesproject?hl=en
>>>>> ---
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Arches Project" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> -- 
>>>> -- To post, send email to [email protected]. To unsubscribe, 
>>>> send email to [email protected]. For more information, 
>>>> visit https://groups.google.com/d/forum/archesproject?hl=en
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Arches Project" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
> -- To post, send email to [email protected] <javascript:>. To 
> unsubscribe, send email to [email protected] <javascript:>. 
> For more information, visit 
> https://groups.google.com/d/forum/archesproject?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Arches Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
-- To post, send email to [email protected]. To unsubscribe, send 
email to [email protected]. For more information, 
visit https://groups.google.com/d/forum/archesproject?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Arches Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to