Hi Ted,

I was running the training with 20news-18828 (which is not sorted by date).
Going through the code I saw that the program was looking for a directory
for each newsgroup so I guess I specified the correct directory path.

I downloaded the 20news-bydate data set. the directory 20news-bydate didnt
have a .DS_Store and so the program ran just fine. Also the
ExecutionException didnt happen.

I couldnt understand how to interpret the output and am trying to see where
I could get more info on the basics of sgd. Any help regarding this would be
great.

regards,
Joe.

On Sun, Oct 10, 2010 at 9:49 AM, Ted Dunning <[email protected]> wrote:

> Joe
>
> I normally train with the bydate split training set so I don't know the
> layout of the data you are using.
>
> My guess is that you are giving the location that is one level too high.
> The assumption if the trainnewsgroup program is that it will see one
> directory per news group and each file in those dirs will be a single
> message.
>
> The change you make is a good one for robustness. We should probably make
> an additional one that checks to see that the dirs look right.
>
> If you want to try on exactly the same data you can take a look at jason
> rennie's site for the 20news-bydate data set.  You would pass in the path to
> the train data there.
>
> Sent from my iPhone
>
> On Oct 9, 2010, at 9:25 PM, Joe Kumar <[email protected]> wrote:
>
> >
>

Reply via email to