Package: edict
Version: 2009.03.03-1
Severity: wishlist
Tags: patch
There are a number of interfaces to edict in several environments,
including GTK, emacs, and command-line (xjdic, which unfortunately
won't work in modern UTF-8 terminals). Even then, I find myself
running grep on edict very often. The queries I make have a great
deal of regularity to them --- searching for whole words or exact
fields or common entries and so on --- so I ended up writing a bash
script which works kind of like a xjdic-lite.
I thought other people might find this script convenient, perhaps
enough so to bundle it with the edict package (or make a separate
one?), so I submit it to the maintainer's discretion; if you think it
sounds useful, feel free to distribute it; if you think it's bloat, no
hard feelings here.
-- System Information:
Debian Release: squeeze/sid
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.6.29-2-686 (SMP w/1 CPU core)
Locale: LANG=ja_JP.UTF-8, LC_CTYPE=ja_JP.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
edict depends on no packages.
edict recommends no packages.
Versions of packages edict suggests:
ii lookup 1.08b-10 interactive utility to search text
-- no debconf information
#!/bin/bash
set -e
common=n
word=n
exact=n
case_sensitive=n
encoding="$(locale -k charmap|sed -e 's/.*="\([^"]*\)"/\1/')"
edict=/usr/share/edict/edict
function usage()
{
cat <<EOF
Usage: $(basename $0) [options] <query>...
Options:
-c: Match only common entries.
-w: Word match; each English query must match whole words only.
-e: Exact match; each query must match an entire edict field.
-s: Make English queries case-sensitive (default is insensitive).
-f EDICT_FILE: Path to edict (default: $edict).
This program assumes edict is in the original EUC-JP encoding.
-c Current terminal character encoding (default: $encoding).
-h: Show this message.
If you use multiple queries, it is like an AND search (that is,
\$ $(basename $0) cold war
will match all entries mentioning both 'cold' and 'war'. To
search for an exact sentence, use shell escaping:
\$ $(basename $0) "cold war"
(and consider -e).
Be sure to put the queries only AFTER all options!
EOF
}
while getopts "cpwesf:c:h" opt; do
case $opt in
c|p) common=y;;
w) word=y;;
e) exact=y;;
s) case_sensitive=s;;
f) edict="$OPTARG";;
c) encoding="$OPTARG";;
h) usage; exit 0;;
*) usage; exit 1;;
esac
done
shift $(($OPTIND -1))
if ! [ "$encoding" ]; then
cat 1>&2 <<EOS
Error: could not guess terminal character encoding from \`locale -k charmap\`.
(You can use option -c as a workaround.)
EOS
fi
if ! [ -r "$edict" ]; then
echo "Error: could not read edict file \`$edict'." 2>&1
echo "Use option -f to specify the dictionary file." 2>&1
exit 1
fi
function encode()
{
iconv -f $encoding -t EUC-JP
}
function decode()
{
iconv -f EUC-JP -t $encoding
}
tmp1=$(mktemp)
tmp2=$(mktemp)
function cleanup()
{
rm -f $tmp1 $tmp2 2>&1
}
trap cleanup EXIT
declare -a queries
for arg in "$@"; do
regexp="$arg"
if [ $word == y ]; then
regexp="\<$regexp\>"
fi
if [ $exact == y ]; then
# catches kanji words at start of line
tmp="(^$regexp )"
# catches pronounciations (between brackets)
tmp+="|(\\[$regexp\\])"
# catches exact english queries, ignoring initial dictionary tags
# that are between parenthesis
tmp+="|(/(\\([^)]*\\) *)*$regexp)/"
regexp="$tmp"
fi
queries+=("$regexp")
done
grepopts='-E'
if ! [ $case_sensitive == y ]; then
grepopts+=' -i'
fi
first=y
for query in "${queri...@]}"; do
equery="$(echo "$query" | encode)"
if [ $first == y ]; then
first=n
grep $grepopts "$equery" "$edict" > $tmp1
else
grep $grepopts "$equery" $tmp1 > $tmp2 && cp $tmp2 $tmp1
fi
done
if [ $common == y ]; then
common_rg="$(echo '(P)' |encode)"
grep $grepopts "$common_rg" $tmp1 > $tmp2 && cp $tmp2 $tmp1
fi
# a last grep to color the matched parts
color_regexp=''
for query in "${queri...@]}"; do
color_regexp+="|($query)"
done
color_regexp="$(echo "$color_regexp"|encode)"
grep $grepopts --color=yes "$color_regexp" $tmp1|decode
cleanup