:>: -----Original Message----- :>: From: IBM Mainframe Discussion List [mailto:[email protected]] On :>: Behalf Of Steve Comstock :>: Sent: Monday, November 19, 2012 2:00 PM :>: To: [email protected] :>: Subject: Re: Parsing (was: "New" way to do UCB lookups) :>: :>: On 11/19/2012 2:56 PM, Paul Gilmartin wrote: :>: > On Mon, 19 Nov 2012 21:39:57 +0000, Lindy Mayfield wrote: :>: >> :>: >> It gets me all Lewis Carroll just thinking about it. I cannot even :>: imagine how to create something like that SQL in Finnish. Something so :>: simple as that, I cannot even think how a computer could parse it :>: written in an agglutinative language. Though I am a bear of very little :>: brain, so I'm sure it could be done. :-) :>: >> :>: > Wouldn't this be somewhat like FORTRAN, where the lexical analyzer :>: first removes :>: > _all_[1] blanks, rendering the source code maximally agglutinative, :>: then attempts :>: > to parse the mess so created? :>: > :>: > [1] Well, except in quoted or counted text strings. :>: > :>: >> So to bring it a bit back on to topic, English can be weird, but :>: sometimes quite useful in its own way. :>: >> :>: > Classic Latin was written with no interword separators. :>: :>: Interesting. I didn't know that. Japanese is written with no :>: interword separators also.
According to one of the folks I worked with over there, on the rare occasion when the character sequence is not sufficient to determine where the word break is, they will use a dot (think period or decimal point) to separate the characters. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
