Hi Ted, I was running the training with 20news-18828 (which is not sorted by date). Going through the code I saw that the program was looking for a directory for each newsgroup so I guess I specified the correct directory path.
I downloaded the 20news-bydate data set. the directory 20news-bydate didnt have a .DS_Store and so the program ran just fine. Also the ExecutionException didnt happen. I couldnt understand how to interpret the output and am trying to see where I could get more info on the basics of sgd. Any help regarding this would be great. regards, Joe. On Sun, Oct 10, 2010 at 9:49 AM, Ted Dunning <[email protected]> wrote: > Joe > > I normally train with the bydate split training set so I don't know the > layout of the data you are using. > > My guess is that you are giving the location that is one level too high. > The assumption if the trainnewsgroup program is that it will see one > directory per news group and each file in those dirs will be a single > message. > > The change you make is a good one for robustness. We should probably make > an additional one that checks to see that the dirs look right. > > If you want to try on exactly the same data you can take a look at jason > rennie's site for the 20news-bydate data set. You would pass in the path to > the train data there. > > Sent from my iPhone > > On Oct 9, 2010, at 9:25 PM, Joe Kumar <[email protected]> wrote: > > > >
