Hi,

May someone explain to me why class SegmentingTokenizerBase using a buffer with 
a size of only 1024 characters? In the source code, the comment was left there 
mentioning possible truncated token if no safe-stopping index can be found for 
the existing chars in the buffer.

It doesn't sound reasonable that a sentence is always no more than 1024 
characters or there is always a safe stopper, like new line can be found in a 
sentence.

Thanks,

Guan

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be 
used for urgent or sensitive issues 

Reply via email to