Dear all I have not tracked yet the whole process but because some unexpected doccat results I looked a little bit at the code.
Do you confirm that the DoccatTrainerTool whitespace tokenize (by creating DocumentSample) while the DoccatTool "SimpleTokenize" ? This should not be the case. Both should use the same tokenizer; in particular : The whitespace tokenizer ! If not which one is used ? Best regards /Nicolas
