Hi Lucy, I just wanted to say that I agree with Koen in that I use OpenOffice to manipulate my authority documents, resource graphs and in preparing my .arches files and find that it works very well in terms of handling issues such as yours. I can't recommend it highly enough after having similar problems with Excel.
Best wishes, Richard On Monday, January 25, 2016 at 9:39:35 AM UTC, Koen Van Daele wrote: > > Hi Lucy, > > > character encodings are one of those nasty issues in computing that nobody > likes tackling. If you want a detailed, yet fairly easy to follow analysis > on why that is, see http://www.joelonsoftware.com/articles/Unicode.html > (Cthulhu > is waiting for you there though...) > > > Basically, what Arches does is the best thing possible. That way most > human languages can be integrated in Arches, and all you need to do is make > sure your data is UTF-8. Unfortunately Excel makes that bloody impossible. > I think Excel saves that file in the ISO-8859-1 encoding. That > encoding just doesn't know the characters you're trying to save (ISO-8859-1 > only contains 191 characters). So, it's not just Arches. I can't read them > either. Excel should be telling you when saving as CSV that you will lose > information), it still wouldn't work since your csv file already contains > illegal ISO-8859-1 characters. > > > And it's not just Excel, the whole Windows ecosystem is fundamentelly > flawed in that regard. I myself run Linux where character encoding is > handled correctly and UTF-8 is the default. No idea how they do it on a Mac. > > > So, I think using OpenOffice is your best bet. Or just open the csv file > you have in Notepad++ (or similar text editor), save the file as UTF-8 and > fix the problems manually. But then you'd have to do that every time you > want to change something. > > > Cheers, > > Koen > > > ------------------------------ > *Van:* [email protected] <javascript:> < > [email protected] <javascript:>> namens Lucy FJ < > [email protected] <javascript:>> > *Verzonden:* zondag 24 januari 2016 12:28 > *Aan:* Arches Project > *Onderwerp:* Re: [Arches] Diacriticals in authority and .Arches files > problems > > Hi Koen, > > Thank you for this information. I did tryout some of the suggestions on > Google for using Excel to create UTF-8 files, because I like using Excel > and know it well, but I have tried some and they are over complicated and > produce a CVS file in UTF-BOM format which I believe will not work in > Arches. It looks like I will need to download the Openoffice version as you > suggest. Must all files loading into Arches be UTF-8 only? > > Lucy > > On Friday, January 22, 2016 at 4:24:42 PM UTC+2, Koen Van Daele wrote: >> >> Hi Lucy, >> >> >> as far as I know Excel (all versions) are notoriously bad at handling >> things like character encodings. This rather old Stackoverflow question >> seems to confirm that: >> >> http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding >> It >> does offer some workarounds, but none of them are very nice. >> >> >> I would suggest writing your CSV files with Libreoffice/Openoffice. You >> should be able to install it and it's free. While it's not always an exact >> replacement for Excel, when it comes to character encodings, it just works. >> By default it will save things as UTF-8 (at least under Linux it does) and >> it will ask you if you want to save in a different encoding. >> >> >> Cheers, >> >> Koen >> >> >> >> Op vrijdag 22 januari 2016 15:05:52 UTC+1 schreef Lucy FJ: >>> >>> Hi Adam and Alexei, >>> >>> I forgot to add that the diacriticals are in the altnames at rows 132 to >>> 136 when editing in Excel. >>> >>> Lucy >>> >>> ----- Original Message ----- >>> *From:* Adam Cox >>> *To:* Lucy Fletcher-Jones >>> *Cc:* Alexei Peters ; Arches Project >>> *Sent:* Thursday, January 21, 2016 5:36 PM >>> *Subject:* Re: [Arches] Diacriticals in authority and .Arches files >>> problems >>> >>> Hi Lucy, you can check the encoding in Notepad ++. Open your authority >>> document with that program, and click the Encoding menu. Your file should >>> be in "UTF-8" or "UTF-8 without BOM" (depends on the version of Notepad ++ >>> you have). The î character should work as far as I know... >>> >>> On Thu, Jan 21, 2016 at 7:18 AM, 'Lucy Fletcher-Jones' via Arches >>> Project <[email protected]> wrote: >>> >>>> Hi Alexei, >>>> >>>> Thank you for looking into this. I am glad to hear that Arches should >>>> support diacriticals. >>>> >>>> Here is the error message on loading the 'Ruler' Authority document: >>>> >>>> RULER_AUTHORITY_DOCUMENT.csv >>>> >>>> ERRORS IN FILE: RULER_AUTHORITY_DOCUMENT.values.csv >>>> >>>> ERRORS IN FILE: RULER_AUTHORITY_DOCUMENT.csv >>>> >>>> ERROR: Make sure the file is saved with UTF-8 encoding >>>> 'utf8' codec can't decode byte 0xea in position 30: invalid >>>> continuation byte >>>> Traceback (most recent call last): >>>> File >>>> "/opt/projects/ENV/lib/python2.7/site-packages/arches/management/commands/package_utils/authority_files.py", >>>> >>>> line 112, in load_authority_file >>>> for row in rows: >>>> File >>>> "/opt/projects/ENV/lib/python2.7/site-packages/unicodecsv/py2.py", line >>>> 217, in next >>>> row = csv.DictReader.next(self) >>>> File "/usr/local/lib/python2.7/csv.py", line 104, in next >>>> row = self.reader.next() >>>> File >>>> "/opt/projects/ENV/lib/python2.7/site-packages/unicodecsv/py2.py", line >>>> 128, in next >>>> for value in row] >>>> File "/opt/projects/ENV/lib/python2.7/encodings/utf_8_sig.py", line >>>> 22, in decode >>>> (output, consumed) = codecs.utf_8_decode(input, errors, True) >>>> UnicodeDecodeError: 'utf8' codec can't decode byte 0xea in position 30: >>>> invalid continuation byte >>>> >>>> ERROR in row 31 (Legacyoid (RULER_UID:30) not found. Make sure your >>>> ParentConceptid in the >>>> >>>> This caused further errors in the Ruler Values files as can be seen >>>> from above. >>>> I do not have a copy of the authority file that caused the error asI >>>> have since corrected it and changed it in a few places. But the >>>> alternative >>>> name was >>>> >>>> Ptolemaîos Philadelphos >>>> >>>> and I believe it was the circumflex above the 'i' that caused the >>>> problem. Certainly when I removed the circumflex, the file loaded OK. >>>> >>>> Thank you, >>>> Lucy >>>> >>>> >>>> ----- Original Message ----- >>>> >>>> *From:* Alexei Peters >>>> *To:* Lucy FJ >>>> *Cc:* Arches Project >>>> *Sent:* Wednesday, January 20, 2016 8:24 PM >>>> *Subject:* Re: [Arches] Diacriticals in authority and .Arches files >>>> problems >>>> >>>> Hi Lucy, >>>> The .arches file should support diacritics. I'm actually surprised >>>> that the authority files don't. I just tested a local file and I was able >>>> to add these records: >>>> >>>> conceptid,PrefLabel,AltLabels,ParentConceptid,ConceptType,Provider >>>> >>>> 20000001-0000-0000-0000-000000000000,Portland,,CITY_AUTHORITY_DOCUMENT.csv,Index,GCI >>>> 20000002-0000-0000-0000-000000000000,San Francisco,The Bay >>>> Area,CITY_AUTHORITY_DOCUMENT.csv,Index,GCI >>>> 20000003-0000-0000-0000-000000000000,San Jose,San >>>> José,CITY_AUTHORITY_DOCUMENT.csv,Index,GCI >>>> >>>> Notice that the alt label for San Jose, is San José >>>> >>>> Can you share the authority file that you're having trouble with? >>>> Cheers, >>>> Alexei >>>> >>>> >>>> Director of Web Development - Farallon Geographics, Inc. - 971.227.3173 >>>> >>>> On Wed, Jan 20, 2016 at 12:32 AM, Lucy FJ <[email protected]> wrote: >>>> >>>>> Hi all, >>>>> We have been loading customised authority files and have noticed that >>>>> Arches rejects words with diacriticals (accents etc). This is not a >>>>> problem >>>>> for us as we were happy to remove them and if we really want them we can >>>>> enter then through the RDM. But will this problem occur when loading >>>>> resource data through .arches? We need to input place names as >>>>> alternative >>>>> names using diacriticals and it would be much easier if we can do this >>>>> via >>>>> .arches files. We know we can input them using the resource data manager >>>>> but obviously when dealing with about 3000 entries,,this is time >>>>> consuming. >>>>> Any ideas? >>>>> Lucy >>>>> >>>>> -- >>>>> -- To post, send email to [email protected]. To unsubscribe, >>>>> send email to [email protected]. For more information, >>>>> visit https://groups.google.com/d/forum/archesproject?hl=en >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Arches Project" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>>> -- To post, send email to [email protected]. To unsubscribe, >>>> send email to [email protected]. For more information, >>>> visit https://groups.google.com/d/forum/archesproject?hl=en >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "Arches Project" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- > -- To post, send email to [email protected] <javascript:>. To > unsubscribe, send email to [email protected] <javascript:>. > For more information, visit > https://groups.google.com/d/forum/archesproject?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Arches Project" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > For more options, visit https://groups.google.com/d/optout. > -- -- To post, send email to [email protected]. To unsubscribe, send email to [email protected]. For more information, visit https://groups.google.com/d/forum/archesproject?hl=en --- You received this message because you are subscribed to the Google Groups "Arches Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
