Joe I normally train with the bydate split training set so I don't know the layout of the data you are using.
My guess is that you are giving the location that is one level too high. The assumption if the trainnewsgroup program is that it will see one directory per news group and each file in those dirs will be a single message. The change you make is a good one for robustness. We should probably make an additional one that checks to see that the dirs look right. If you want to try on exactly the same data you can take a look at jason rennie's site for the 20news-bydate data set. You would pass in the path to the train data there. Sent from my iPhone On Oct 9, 2010, at 9:25 PM, Joe Kumar <[email protected]> wrote: >
