update-unicode

Beat Bolli Tue, 13 Dec 2016 15:44:47 -0800

As it's used only by a tiny minority of the Git developer population,
this script does not belong into the main Git source directory.


Move it into contrib/ and adjust the paths to account for the new
location.

Signed-off-by: Beat Bolli <dev+...@drbeat.li>
---
 .gitignore                               |  1 -
 contrib/update-unicode/.gitignore        |  3 +++
 contrib/update-unicode/README            | 20 ++++++++++++++++
 contrib/update-unicode/update_unicode.sh | 38 ++++++++++++++++++++++++++++++
 update_unicode.sh                        | 40 --------------------------------
 5 files changed, 61 insertions(+), 41 deletions(-)
 create mode 100644 contrib/update-unicode/.gitignore
 create mode 100644 contrib/update-unicode/README
 create mode 100755 contrib/update-unicode/update_unicode.sh
 delete mode 100755 update_unicode.sh

diff --git a/.gitignore b/.gitignore
index f96e50e..5555ae0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -204,7 +204,6 @@
 /config.mak.autogen
 /config.mak.append
 /configure
-/unicode
 /tags
 /TAGS
 /cscope*
diff --git a/contrib/update-unicode/.gitignore 
b/contrib/update-unicode/.gitignore
new file mode 100644
index 0000000..b0ebc6a
--- /dev/null
+++ b/contrib/update-unicode/.gitignore
@@ -0,0 +1,3 @@
+uniset/
+UnicodeData.txt
+EastAsianWidth.txt
diff --git a/contrib/update-unicode/README b/contrib/update-unicode/README
new file mode 100644
index 0000000..b9e2fc8
--- /dev/null
+++ b/contrib/update-unicode/README
@@ -0,0 +1,20 @@
+TL;DR: Run update_unicode.sh after the publication of a new Unicode
+standard and commit the resulting unicode_widths.h file.
+
+The long version
+================
+
+The Git source code ships the file unicode_widths.h which contains
+tables of zero and double width Unicode code points, respectively.
+These tables are generated using update_unicode.sh in this directory.
+update_unicode.sh itself uses a third-party tool, uniset, to query two
+Unicode data files for the interesting code points.
+
+On first run, update_unicode.sh clones uniset from Github and builds it.
+This requires a current-ish version of autoconf (2.69 works per December
+2016).
+
+On each run, update_unicode.sh checks whether more recent Unicode data
+files are available from the Unicode consortium, and rebuilds the header
+unicode_widths.h with the new data. The new header can then be
+committed.
diff --git a/contrib/update-unicode/update_unicode.sh 
b/contrib/update-unicode/update_unicode.sh
new file mode 100755
index 0000000..7b90126
--- /dev/null
+++ b/contrib/update-unicode/update_unicode.sh
@@ -0,0 +1,38 @@
+#!/bin/sh
+#See http://www.unicode.org/reports/tr44/
+#
+#Me Enclosing_Mark  an enclosing combining mark
+#Mn Nonspacing_Mark a nonspacing combining mark (zero advance width)
+#Cf Format          a format control character
+#
+cd "$(dirname "$0")"
+UNICODEWIDTH_H=$(git rev-parse --show-toplevel)/unicode_width.h
+(
+       if ! test -f UnicodeData.txt; then
+               wget 
http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
+       fi &&
+       if ! test -f EastAsianWidth.txt; then
+               wget 
http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
+       fi &&
+       if ! test -d uniset; then
+               git clone https://github.com/depp/uniset.git
+       fi &&
+       (
+               cd uniset &&
+               if ! test -x uniset; then
+                       autoreconf -i &&
+                       ./configure --enable-warnings=-Werror CFLAGS='-O0 -ggdb'
+               fi &&
+               make
+       ) &&
+       UNICODE_DIR=. && export UNICODE_DIR &&
+       cat >$UNICODEWIDTH_H <<-EOF
+       static const struct interval zero_width[] = {
+               $(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD |
+                 grep -v plane)
+       };
+       static const struct interval double_width[] = {
+               $(uniset/uniset --32 eaw:F,W)
+       };
+       EOF
+)
diff --git a/update_unicode.sh b/update_unicode.sh
deleted file mode 100755
index 27af77c..0000000
--- a/update_unicode.sh
+++ /dev/null
@@ -1,40 +0,0 @@
-#!/bin/sh
-#See http://www.unicode.org/reports/tr44/
-#
-#Me Enclosing_Mark  an enclosing combining mark
-#Mn Nonspacing_Mark a nonspacing combining mark (zero advance width)
-#Cf Format          a format control character
-#
-UNICODEWIDTH_H=../unicode_width.h
-if ! test -d unicode; then
-       mkdir unicode
-fi &&
-( cd unicode &&
-       if ! test -f UnicodeData.txt; then
-               wget 
http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
-       fi &&
-       if ! test -f EastAsianWidth.txt; then
-               wget 
http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
-       fi &&
-       if ! test -d uniset; then
-               git clone https://github.com/depp/uniset.git
-       fi &&
-       (
-               cd uniset &&
-               if ! test -x uniset; then
-                       autoreconf -i &&
-                       ./configure --enable-warnings=-Werror CFLAGS='-O0 -ggdb'
-               fi &&
-               make
-       ) &&
-       UNICODE_DIR=. && export UNICODE_DIR &&
-       cat >$UNICODEWIDTH_H <<-EOF
-       static const struct interval zero_width[] = {
-               $(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD |
-                 grep -v plane)
-       };
-       static const struct interval double_width[] = {
-               $(uniset/uniset --32 eaw:F,W)
-       };
-       EOF
-)
-- 
2.7.2

[PATCH v2 1/6] update_unicode.sh: move it into contrib/update-unicode

Reply via email to