Exactly what Adam said!
To add to that, there's no single font (or even font family) that has glyphs for every single Unicode character. The Noto font family has aims to do that "in the future," however, and it already includes a lot of fonts as part of its family (including Noto Sans Thai and Noto Serif Thai) that one would have to install. See https://www.google.com/get/noto/ In any event, ASpace should certainly be updated so that the staff-side PDFs have more coverage by default (but I also think there needs to be a decision about whether the platform supports both EAD to PDF transformations as well as HTML/CSS to PDF transformations), but the out-of-the-box approach is never going to cover everything. Perhaps a good next step would be to update Apache FOP (since the version used by ASpace is pretty out of date right now), package ASpace with a few of the Noto fonts so that those could be used in place of the base-14 fonts (e.g. Times is used by FOP for its "any" font), and update the transformation process. Even then, though, I believe that you would actually need to embed the fonts into the PDF file, since if you don't, there's no guarantee that whomever opens the PDF file has that font on their computer, so you might still wind up with character replacements. But the PDF standard allows you to do just that. Last, EAD3 added language and script data attributes for precisely this sort of reason (e.g. if you have one paragraph in English, and another in Arabic, you'd need some reliable method to determine when to switch fonts and the direction of the text). ASpace doesn't have that ability yet (although I'm pretty sure that AtoM does), but it would be a great addition (as well as a necessary one, for this sort of reason) addition. Here's a note from EAD3s tag library: "Support for multilingual description was addressed by adding @lang and @script attributes to all non-empty elements in EAD3, making it possible to explicitly state what language or script is used therein. Additionally, some elements were modified to allow them to repeat where previously they did not, thus enabling the inclusion of the same data in multiple languages." So, lots to do, but all worth doing. ________________________________ From: archivesspace_users_group-boun...@lyralists.lyrasis.org <archivesspace_users_group-boun...@lyralists.lyrasis.org> on behalf of Adam Jazairi <jaza...@bc.edu> Sent: Thursday, January 24, 2019 1:45:38 PM To: Archivesspace Users Group Subject: Re: [Archivesspace_Users_Group] Thai names in Finding Aid PDF Hi Ed, This is a problem with the fonts included in the version of Apache FOP that ASpace uses. There's an open ticket here: https://archivesspace.atlassian.net/browse/ANW-473<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Farchivesspace.atlassian.net%2Fbrowse%2FANW-473&data=02%7C01%7Cmark.custer%40yale.edu%7C48edfc3ae3324bae138d08d6822c2cb0%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C636839523610525508&sdata=z%2FQrtRc3rB1CBIR3Djte09O2zYiXvUpJAJP%2FA7HftOw%3D&reserved=0> We've encountered the same issue when we attempt to generate a PDF finding aid containing Irish or Japanese diacritics. An interim solution we've been using is to export the EAD, then run Saxon<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsaxon.sourceforge.net%2F%23F9.9HE&data=02%7C01%7Cmark.custer%40yale.edu%7C48edfc3ae3324bae138d08d6822c2cb0%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C636839523610535518&sdata=Fdt%2FN5Ei5qarYYT5qVq%2FhgbMgp5TaFJzpOs4MPlixVE%3D&reserved=0> on it to generate an FO file, then run FOP 1.0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fxmlgraphics.apache.org%2Ffop%2F1.0%2F&data=02%7C01%7Cmark.custer%40yale.edu%7C48edfc3ae3324bae138d08d6822c2cb0%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C636839523610535518&sdata=zPfmkMTcJBxUjq9IP93jQy6buUoT9i2M0RFPbx0G5PQ%3D&reserved=0> with the appropriate font on the FO file to generate the PDF. It's a bit cumbersome, but it's worked for us so far. Here's the FOP conf file that we use: https://github.com/BCDigLib/bc-aspace/blob/master/fop/fop.xconf<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBCDigLib%2Fbc-aspace%2Fblob%2Fmaster%2Ffop%2Ffop.xconf&data=02%7C01%7Cmark.custer%40yale.edu%7C48edfc3ae3324bae138d08d6822c2cb0%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C636839523610545518&sdata=GgWWuL3vTkfjsJpZIypVKl2VgIIXF5qgieIez36Ks9U%3D&reserved=0> The only catch is that you'll need a font that supports the unicode characters you need. In your case, it looks like Arial v2.95 would work: https://en.wikipedia.org/wiki/Arial#TrueType/OpenType_version_history<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FArial%23TrueType%2FOpenType_version_history&data=02%7C01%7Cmark.custer%40yale.edu%7C48edfc3ae3324bae138d08d6822c2cb0%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C636839523610545518&sdata=9MgY%2BExVKK4yhLPXxdlHsAdwJCIz5jDIBUtmE4HlTHk%3D&reserved=0> Hope this helps. Adam On Thu, Jan 24, 2019 at 1:14 PM Tang, Lydia <lta...@lib.msu.edu<mailto:lta...@lib.msu.edu>> wrote: Hi Ed, The related ticket that I see is here: https://archivesspace.atlassian.net/browse/ANW-294?jql=text%20~%20%22pdf%20diacritics%22<https://na01.safelinks.protection.outlook.com/?url=https:%2F%2Farchivesspace.atlassian.net%2Fbrowse%2FANW-294%3Fjql%3Dtext%2520~%2520%2522pdf%2520diacritics%2522&data=02%7C01%7Cmark.custer%40yale.edu%7C48edfc3ae3324bae138d08d6822c2cb0%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C636839523610555528&sdata=rSJ6Fc0g2VvG3tlzd8gxx4uBzXtrebM4aouNPRfR5kc%3D&reserved=0> It is “closed – completed” It doesn’t look like Marcella’s ticket was ever created. Ed, please go ahead and create a new ticket! Thanks for pointing this out! Lydia – on behalf of Dev. Pri. From: <archivesspace_users_group-boun...@lyralists.lyrasis.org<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>> on behalf of "Busch, Edward" <busch...@msu.edu<mailto:busch...@msu.edu>> Reply-To: Archivesspace Users Group <archivesspace_users_group@lyralists.lyrasis.org<mailto:archivesspace_users_group@lyralists.lyrasis.org>> Date: Thursday, January 24, 2019 at 12:53 PM To: "'archivesspace_users_group@lyralists.lyrasis.org<mailto:archivesspace_users_group@lyralists.lyrasis.org>'" <archivesspace_users_group@lyralists.lyrasis.org<mailto:archivesspace_users_group@lyralists.lyrasis.org>> Subject: [Archivesspace_Users_Group] Thai names in Finding Aid PDF I’m not sure if there is an open ticket on this or not; a quick search didn’t reveal anything directly. Agents with Thai names and diacritics look correct in ASpace but when generated into a PDF finding aid, do not. They end up like: Saph# K#ns#ks# h#ng Ch#t I can create a ticket if needed. Ed Busch, MLIS Electronic Records Archivist Michigan State University Archives Conrad Hall 943 Conrad Road, Room 101 East Lansing, MI 48824 517-884-6438 busch...@msu.edu<mailto:busch...@msu.edu><mailto:busch...@msu.edu<mailto:busch...@msu.edu>> _______________________________________________ Archivesspace_Users_Group mailing list Archivesspace_Users_Group@lyralists.lyrasis.org<mailto:Archivesspace_Users_Group@lyralists.lyrasis.org> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flyralists.lyrasis.org%2Fmailman%2Flistinfo%2Farchivesspace_users_group&data=02%7C01%7Cmark.custer%40yale.edu%7C48edfc3ae3324bae138d08d6822c2cb0%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C636839523610555528&sdata=awkkD4GZQV8YUIO9gjuQ7c6%2FJYGcgmhderTdusF28Ho%3D&reserved=0> -- Adam Jazairi Digital Repository Services Boston College Libraries (617) 552-1404 adam.jaza...@bc.edu<mailto:adam.jaza...@bc.edu>
_______________________________________________ Archivesspace_Users_Group mailing list Archivesspace_Users_Group@lyralists.lyrasis.org http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group