2 yil kadar once, bir dilbilimi projesi icin PHP ile
yazdigim heceleme betigini, az once CL ile tekrar
yazdim. (Neden?)
Zemberek'in heceleme aparatindan daha ba$arili
oldugunu iddia etmek isterdim, ama u$endigim
icin kar$ila$tirma yapamadim. Ilerde in$allah...
Dosyayi derleyip, lisp'e yuklerken, unikod
hadiselere dikkat etmek gerekiyor. Mesela,
SBCL'de load ve compile fonksiyonlarini
:external-format :utf-8 parametresiyle cagirmali.
Ornek birkac kullanim:
CL-USER> (tr.gen.hb.hecele:hyphenate "turk")
("turk")
CL-USER> (tr.gen.hb.hecele:hyphenate "trabzon")
("trab" "zon")
CL-USER> (tr.gen.hb.hecele:hyphenate "kramp")
("kramp")
CL-USER> (tr.gen.hb.hecele:hyphenate "iyi")
("i" "yi")
CL-USER> (tr.gen.hb.hecele:hyphenate
"çekoslavakyalılaştıramadıklarımızdanmısınız")
("çe" "kos" "la" "vak" "ya" "lı" "laş" "tı" "ra" "ma" "dık" "la" "rı" "mız"
"dan" "mı" "sı" "nız")
CL-USER> (tr.gen.hb.hecele:hyphenate "Hâldun")
("Hâl" "dun")
CL-USER> (tr.gen.hb.hecele:hyphenate "trabzandan kayilmaz")
("trab" "zan" "dan " "ka" "yil" "maz")
CL-USER> (tr.gen.hb.hecele:hyphenate "Slovakya")
("Slo" "vak" "ya")
Yanlis heceledigini du$undugunuz sozcukleri
lutfen bildirin.
Haldun.
Not: Dosyanin daimi adresi (ba$ina bi i$ gelmezse)
http://knuth.cs.bilgi.edu.tr/~hb
(in-package :common-lisp-user)
(defpackage :tr.gen.hb.hecele
(:use :common-lisp)
(:export :hyphenate))
(in-package :tr.gen.hb.hecele)
(defconstant +tests+
'(("cccvcc" . 5) ("cccvcv" . 4)
("cccvc" . 5) ("ccvccc" . 5)
("ccvccv" . 4) ("ccvcc" . 5)
("ccvcc" . 4) ("ccvcv" . 3)
("ccvc" . 4) ("ccvv" . 3)
("ccv" . 3) ("cvccc" . 4)
("cvccv" . 3) ("cvcc" . 4) ("cvcv" . 2)
("cvc" . 3) ("cv" . 2) ("vccc" . 3)
("vccv" . 2) ("vcc" . 3) ("vcvc" . 1)
("vcv" . 1) ("vc" . 2) ("v" . 1)))
(defconstant +vowels+
'(#\a #\A #\à #\à #\à #\â #\e #\E #\ı
#\î #\I #\i #\İ #\o #\O #\ö #\à #\u
#\U #\û #\ü #\Ã))
(defun hyphenate (word)
"Returns a list containing the hyphens of the given word.
Raises an error, if it is not possible to hyphenize the word."
(if (string-equal word "")
nil
(let ((hyphen (first-hyphen word)))
(if hyphen
(cons hyphen (hyphenate (subseq word (length hyphen))))
(error "Cannot hyphenate ~s" word)))))
(defun first-hyphen (word)
"Determines the first hyphen of the given word due to the
hyphenation tests. Returns NIL if it cannot extract the first
hyphen of the word."
(let ((cv-word (convert-to-cv word)))
(dolist (test +tests+)
(if (starts-with cv-word (car test))
(return (subseq word 0 (cdr test)))))))
(defun convert-to-cv (word)
"Converts the given word into a c-v notation word. Eg. haldun => cvccvc"
(let ((retval ""))
(do-string (char word)
(setf retval (concatenate 'string retval (if (vowel-p char) "v" "c"))))
retval))
(defun vowel-p (char)
"Returns a non-nil value if CHAR is a vowel."
(member char +vowels+ :test #'char-equal))
;; utilities
(defun starts-with (str prefix)
"Returns a non-nil value, if STR starts with PREFIX. Case insensitive."
(and (>= (length str) (length prefix))
(string-equal (subseq str 0 (length prefix)) prefix)))
(defmacro with-gensyms ((&rest syms) &body body)
"A classical with-gensyms macro."
`(let ,(mapcar #'(lambda (s) (list s '(gensym))) syms)
,@body))
(defmacro do-string ((var str) &body body)
"A macro which enables iteration over a given string. Very similar to dolist."
(with-gensyms (gvar glength gstr)
`(let* ((,gstr ,str)
(,glength (length ,gstr)))
(do ((,gvar 0 (+ ,gvar 1)))
((>= ,gvar ,glength))
(let ((,var (char ,gstr ,gvar)))
,@body)))))
_______________________________________________
cs-lisp mailing list
[email protected]
http://church.cs.bilgi.edu.tr/lcg
http://cs.bilgi.edu.tr/mailman/listinfo/cs-lisp