Hi Community,
Let's celebrate BBEdit's 30 years of existence. ๐ ๐ ๐ ๐พ
๐ฅ ๐ ๐จ๐ผโ๐ป ๏ผ ๐๏ธ & ๐ฆ & ๐ฅ
Here is a Swift text filter that could help you prepare your inDesign
birthday cards.
Based on Unicode's Emoji regular expression and Swift's ICU regular
expression engine.
Save in ~/Library/Application Support/BBEdit/Text
Filters/encode_emojis.swift
#!/usr/bin/env swift
// Based on: https://unicode.org/reports/tr51/#EBNF_and_Regex
//
// Changed \p{Emoji} to \p{Basic_Emoji} to avoid matching '#', numbers,
etc.
// Tweaked to match uncovered cases revealed by test files.
//
// Tested against the contents of those test files:
// ------------------------------------------------
// https://unicode.org/emoji/charts/full-emoji-list.html
//
https://raw.githubusercontent.com/unicode-org/icu/main/icu4c/source/data/unidata/emoji-sequences.txt
//
https://raw.githubusercontent.com/unicode-org/icu/main/icu4c/source/data/unidata/emoji-zwj-sequences.txt
// https://unicode.org/Public/emoji/14.0/emoji-test.txt
// example: France ๐ซ๐ท, Snail ๐, Family๐จโ๐ฉโ๐งโ๐ฆ, man
technologist with skin tone ๐จ๐ผโ๐ป
// decimal: France (ef)🇫🇷(\ef), Snail
(ef)🐌(\ef),
Family(ef)👨‍👩‍👧‍👦(\ef),
man technologist with skin tone (ef)👨🏼‍💻(\ef)
// hex: France (ef)🇫🇷(\ef), Snail
(ef)🐌(\ef),
Family(ef)👨‍👩‍👧‍👦(\ef),
man technologist with skin tone (ef)👨🏼‍💻(\ef)
import Foundation
let useDecimalEntities = true // Change to false to encode as
hexadecimal entities.
let openingWrapperTag = "(ef)" // Set to "" if no wrapper tag
needed.
let closingWrapperTag = "(\\ef)" // Set to "" if no wrapper tag
needed.
let pattern = #"""
(?x-i)
(?:
\p{RI} \p{RI}
|
[
\x{00A9}
\x{00AE}
\x{203C}
\x{2049}
\x{2122}
\x{2139}
\x{2194}
\x{2195}
\x{2196}
\x{2197}
\x{2198}
\x{2199}
\x{21A9}
\x{21AA}
\x{2328}
\x{23CF}
\x{23ED}
\x{23EE}
\x{23EF}
\x{23F1}
\x{23F2}
\x{23F8}
\x{23F9}
\x{23FA}
\x{24C2}
\x{25AA}
\x{25AB}
\x{25B6}
\x{25C0}
\x{25FB}
\x{25FC}
\x{2702}
\x{2708}
\x{2709}
\x{270F}
\x{2712}
\x{2714}
\x{2716}
\x{271D}
\x{2721}
\x{2733}
\x{2734}
\x{2744}
\x{2747}
\x{2763}
\x{27A1}
\x{2934}
\x{2935}
\x{2B05}
\x{2B06}
\x{2B07}
\x{3030}
\x{303D}
\x{3297}
\x{3299}
\x{1F170}
\x{1F171}
\x{1F17E}
\x{1F17F}
\x{1F202}
\x{1F237}
]
\x{FE0F}
|
[
\x{0023}
\x{002A}
\x{0030}
\x{0031}
\x{0032}
\x{0033}
\x{0034}
\x{0035}
\x{0036}
\x{0037}
\x{0038}
\x{0039}
]
\x{FE0F} \x{20E3}
|
[
\p{Basic_Emoji}
\x{1F300}-\x{1F5FF}
\x{1F3CA}-\x{1F3CC}
\x{1F3F3}
\x{1F3F4}
\x{1F441}
\x{1F574}
\x{1F575}
\x{1F590}
\x{1F680}-\x{1F6FF}
\x{2600}-\x{26FF}
\x{261D}
\x{26F9}
\x{270C}
\x{270D}
\x{2764}
]
(?:
\p{EMod}
|
\x{FE0F} \x{20E3}?
|
[\x{E0020}-\x{E007E}]+
\x{E007F}
)?
(?:
\x{200D}
[
\p{Basic_Emoji}
\x{1F32B}
\x{1F5E8}
\x{2620}
\x{2640}
\x{2642}
\x{2695}
\x{2696}
\x{26A7}
\x{2708}
\x{2744}
\x{2764}
]
(?:
\p{EMod}
|
\x{FE0F} \x{20E3}?
|
[\x{E0020}-\x{E007E}]+
\x{E007F}
)?
)*
)
"""#
let regex = try NSRegularExpression(pattern: pattern, options: [])
var output: [String] = []
while var line = readLine() {
let range = NSRange(line.startIndex..<line.endIndex, in: line)
let matches = regex.matches(in: line, options: [], range: range)
for match in matches.reversed() {
if let range = Range(match.range, in: line) {
let emoji = line[range];
let entities = emoji.unicodeScalars.map {
useDecimalEntities ? "&#\(String($0.value, radix: 10,
uppercase: true));" : "&#x\(String($0.value, radix: 16, uppercase: true));"
}
let replacement = entities.joined(separator:"")
line.replaceSubrange(range, with:
"\(openingWrapperTag)\(replacement)\(closingWrapperTag)")
}
}
output.append(line)
}
print(output.joined(separator: "\n"), terminator:"")
--
BBEdit rocks!
Kind regards,
Jean Jourdain
On Tuesday, April 12, 2022 at 5:38:38 PM UTC+2 Justin Ross wrote:
> Sorted it.
>
> The only way for it to work with extra text on either end is to move all
> the single entity codes (e.g. 👦) below the multiple entity ones
> (e.g. 👦🏻) in the canonize file.
>
> So it would look something like this...
>
> 👦🏻
> ӧ🏟
> ҁ
> 👦
> 👧
> 👨
>
> On Sunday, 10 April 2022 at 20:48:51 UTC+1 Justin Ross wrote:
>
>> Hi all, I'm looking for a way to catch any emoji that's used amongst
>> regular text. This is so that I can create an XML file to import into
>> InDesign. Then I simply find/replace any emoji found and convert the
>> character to an emoji font so it can be printed.
>>
>> I've made a canonize file with over 1,000 emoji, separated by a tab, then
>> the decimal equivalent.
>>
>> This works great.
>>
>> *For example:*
>> This emoji is found in the text somewhere:
>> ๐ฆ
>>
>> and is changed to:
>> 👦
>>
>> It also works for skintone emoji where the decimal code can be repeated.
>>
>> This emoji is found in the text somewhere:
>> ๐ฆ๐ป
>>
>> and is changed to:
>> 👦🏻
>>
>>
>> However, as soon as I wrap the code (so it's easier to find/change in
>> InDesign), the duplicate codes cause a problem.
>>
>> For example:
>>
>> This emoji:
>> ๐ฆ
>>
>> Is changed to this:
>> (ef)&\#128102;(\ef)
>>
>> *BUT...*
>>
>> This:
>> ๐ฆ๐ป
>>
>> Is changed to this:
>> (ef)👦(\ef)(ef)🏻(\ef)
>>
>> Note the extra (\ef)(ef) in the middle.
>>
>> Now I could use a find/replace to remove that bit. But what if there are
>> two different emoji next to each other? I'm replacing one problem with
>> another.
>>
>> Is there a way round this?
>>
>> Many thanks if anyone can help.
>>
>>
--
This is the BBEdit Talk public discussion group. If you have a feature request
or need technical support, please email "[email protected]" rather than
posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/bbedit/8488b600-6f0b-40e0-8b2b-a74c114cb71an%40googlegroups.com.