> Date: Mon, 10 Oct 2011 17:47:21 +0800 > From: li bo <libo....@gmail.com> > > *P1. Split the text into separate paragraphs. A paragraph separator is kept > with the previous paragraph. Within each paragraph, apply all the other > rules of this algorithm.* > > Here, what does paragraph mean? which symbols can *Split the text into > separate paragraphs?
>From section 3: Paragraphs are divided by the Paragraph Separator or appropriate Newline Function (for guidelines on the handling of CR, LF, and CRLF, see Section 4.4, Directionality, and Section 5.8, Newline Guidelines of [Unicode]). Paragraphs may also be determined by higher-level protocols: for example, the text in two different cells of a table will be in different paragraphs. > I think only 'Enter' and '*Paragraph separator*' can do paragraph breaking. In addition to the Paragraph Separator, _any_ newline function (LF, CR+LF, CR, or NEL) can end a paragraph. Also U+2028, the LS character. See section 5.8 of the Unicode Standard cited above. IOW, from the UBA point of view, each line is a separate "paragraph". Or at least this is my interpretation of the UBA ;-) > what's the meaning of 'appropriate Newline Functions' and 'higher-level > protocol paragraph determination'? Newline Function (NLF) is described in Section 5.8 of Unicode. Higher-level protocols are described in section 4.3 of UAX#9. In a nutshell, your application can have its own ideas of what begins and what ends a paragraph, and you are allowed to use those rules instead of what P3 says.