I've also received the same UTF8 error when importing legacy accession records that have *valid* diacritical marks in the title and/or agent name.
Lisa On Wed, Feb 15, 2017 at 2:17 PM, Reese, Terry P. <reese.2...@osu.edu> wrote: > I guess my question would be – is your legacy data UTF8? For whatever > reason, I’ve found that historically, Archives have often used other > charactersets when encoding their EAD files (though to be fair, I see this > in MARC records as well; confusion between MARC8, ISO8859-1, and codepage > 1252). The simply solution (and this would maintain your characters) would > be to convert the character set to UTF8. Otherwise, even if you held on to > these values – they wouldn’t display in any form that you could read; and > in fact, that is what the error message is trying to tell you. That as a > UTF8 value, your data is going to be gibberish, regardless of if you keep > it or not. > > > > --tr > > > > *From:* archivesspace_users_group-boun...@lyralists.lyrasis.org [mailto: > archivesspace_users_group-boun...@lyralists.lyrasis.org] *On Behalf Of > *Stasiulatis, > Suzanne > *Sent:* Wednesday, February 15, 2017 3:12 PM > > *To:* Archivesspace Users Group <archivesspace_users_group@ > lyralists.lyrasis.org> > *Subject:* Re: [Archivesspace_Users_Group] Enumerations Findings > > > > I totally agree that we shouldn’t have special characters if at all > possible, but a large amount of our legacy data uses them. Especially in > titles, staff want to use those characters as they are reflected on > original materials. > > > > Suzanne > > > > *From:* archivesspace_users_group-boun...@lyralists.lyrasis.org [ > mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org > <archivesspace_users_group-boun...@lyralists.lyrasis.org>] *On Behalf Of > *Reese, > Terry P. > *Sent:* Wednesday, February 15, 2017 2:58 PM > *To:* Archivesspace Users Group > *Subject:* Re: [Archivesspace_Users_Group] Enumerations Findings > > > > Why would you want to retain invalid special characters? My guess is that > one of the reasons for this error is that invalid characters would cause > problems with indexing for search, as well as impact display and export. I > would think you’d want to use the error as a flag to identify data that > needs to be corrected. Or am I missing something? > > > > --tr > > > > *From:* archivesspace_users_group-boun...@lyralists.lyrasis.org [ > mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org > <archivesspace_users_group-boun...@lyralists.lyrasis.org>] *On Behalf Of > *Stasiulatis, > Suzanne > *Sent:* Wednesday, February 15, 2017 2:52 PM > *To:* Archivesspace Users Group <archivesspace_users_group@ > lyralists.lyrasis.org> > *Subject:* Re: [Archivesspace_Users_Group] Enumerations Findings > > > > This also came up for me recently. If invalid special characters are > present in the content titles, I get this error. I’m not sure quite how to > adjust to accept those special characters. > > > > > > *Suzanne Stasiulatis *| Archivist II > Pennsylvania Historical and Museum Commission | Pennsylvania State > Archives > 350 North Street | Harrisburg, PA 17120-0090 > > Phone: 717-787-5953 <(717)%20787-5953> > > http://www.phmc.pa.gov > > sustasi...@pa.gov > > > > *From:* archivesspace_users_group-boun...@lyralists.lyrasis.org [ > mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org > <archivesspace_users_group-boun...@lyralists.lyrasis.org>] *On Behalf Of > *Majewski, > Steven Dennis (sdm7g) > *Sent:* Wednesday, February 15, 2017 2:36 PM > *To:* Archivesspace Users Group > *Subject:* Re: [Archivesspace_Users_Group] Enumerations Findings > > > > > > > > We have run into the case that some EAD attribute values are required to > be NMTOKENs, thus no embedded spaces or other disallowed characters. We > replaced enumerations with embedded spaces with underscores. > > > > This has only come to my attention in the last week or so, so I haven’t > made a thorough investigation of which attributes or which enumerations > this applies to — just fixed them as I’ve encountered that error. > > > > So it may be intentional that it is using the non translated value. > > ( And I wouldn’t be surprised, if for simplicity, it may be over applying > that rule in places where it’s not actually required. ) > > > > > > — Steve. > > > > > > On Feb 15, 2017, at 2:09 PM, Carlos Lemus <carlos.le...@unlv.edu> wrote: > > > > Hello, > > > > At UNLV Special Collections, we've been working on cleaning up our > enumeration values because in many cases there were duplicates caused by > imports (i.e value: linear_feet vs value: Linear feet vs Linear Feet). We > wanted to stick as close as possible to ArchivesSpace standards and decided > to make our enumeration values all lowercase seperated by an underscore and > then merge any records with incorrect enumerations into that correct value > (i.e value: linear Feet into linear_feet). We also have some custom > enumerations such as: value: oversized_box, translation: Oversized Box; > digital_file; Digital File > > > > After we had that set up correctly, we had some findings and was wondering > if anyone has experienced the same things or had a standard we could use. > > > > 1. When generating PDFs and EADs the enumeration values that were custom > (such as the oversized_box) would come out as machine readable > oversized_box instead of using our local en.yml value (located in the local > plugin). > > This was something I found in the EAD serializer (https://github.com/ > archivesspace/archivesspace/blob/master/backend/app/ > exporters/serializers/ead.rb#L490) and was able to create a temporary > solution of generating it , but required altering the enumeration instead > of referencing our file. I thought i'd point it out because anyone creating > custom enumerations even with a translation in an en.yml file would not > see their change reflected in the EAD export. (I've attached an image > reflecting this) Anyone experience this? > > > > 2. Another example of this case was in the container "type" attribute. > Before something like Oversized Box would be export to EAD as is because > that was it's value in the enumeration. After we changed the value > correctly to oversized_box, it would export to the EAD container "type" as > is and translate to the PDF as well. With some XSLT manipulation I was able > to get it to show up as oversized box (shown in attachments). I've looked > through https://www.loc.gov/ead/tglib/elements/container.html and cannot > find an example of a two+ attribute value. > > > > Should attributes be machine readable (i.e oversized_box), human readable > (Oversized Box), or does it even matter? Of course, exporting it as > Oversized Box would be easiest to translate a user friendly version to the > user. > > > > Excuse me for the lengthy post, I'm trying to be thorough with my > explenation, but please let me know if you've come accross something > similar or have a finite solution. > > > Carlos Lemus > > Application Programmer, Special Collections Technical Services > > University Libraries, University of Nevada, Las Vegas > > > > *How often have I said to you that when you have eliminated the > impossible, whatever remains, however improbable, must be the truth? - > Sherlock Holmes* > > <enumeration_ead.PNG><containers_enum.PNG>__________ > _____________________________________ > Archivesspace_Users_Group mailing list > Archivesspace_Users_Group@lyralists.lyrasis.org > http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group > > > > _______________________________________________ > Archivesspace_Users_Group mailing list > Archivesspace_Users_Group@lyralists.lyrasis.org > http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group > > -- Head of Archival Processing University of Minnesota Libraries Archives and Special Collections Elmer L. Andersen Library, Suite 315 222-21st Ave. S. Minneapolis MN 55455 Phone: 612.626.2531
_______________________________________________ Archivesspace_Users_Group mailing list Archivesspace_Users_Group@lyralists.lyrasis.org http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group