If you are making more changes do that, you are more than welcome to. Just fix a convention. For example, in the clustering algorithms chapter, it was points and clusters-[0-n] like you said. and in dirichlet it was state-n. So it will be better if we stick to a single convention and the book will follow(shouldn't be the other way around)
Robin On Fri, Apr 23, 2010 at 11:30 PM, Jeff Eastman <j...@windwardsolutions.com>wrote: > The APIs did not change but the clustered points directory changed from > "points" to "clusteredPoints" and the various clusters directories changed > from (e.g. canopies, clusters, clusters-n, canopies-n, state-n) to just > clusters-n, where clusters-0 is used for the initial clusters needed for > kmeans and is produced by canopy output by default. > > > On 4/23/10 10:25 AM, Robin Anil wrote: > >> Its not aimed at 0.3 per say. Right now its evolving with the code. For. >> eg. >> the quality factor is something that will go in there. I keep updating the >> code with the latest changes and so does Sean. There isnt much that got >> affected by your latest commit though(it compiles). Though I haven't fully >> tested the code with the dataset after the commit, something I plan to do >> soon. >> >> Robin >> >> On Fri, Apr 23, 2010 at 9:51 PM, Jeff Eastman<j...@windwardsolutions.com >> >wrote: >> >> >> >>> I also wonder how much my recent clustering changes have affected the >>> examples in the clustering sections. I know the book is currently aimed >>> at >>> Mahout 0.3 but users trying the examples with trunk may be frustrated by >>> the >>> recent changes in file naming. Do the examples exist in an unannotated >>> version somewhere that I could get working again on trunk? >>> >>> On 4/23/10 9:10 AM, Sean Owen wrote: >>> >>> >>> >>>> Good eye, this was fixed in the manuscript a while ago. >>>> >>>> I will ping Manning to re-publish Chapters 1-6 since a lot of small >>>> updates have happened since then. >>>> >>>> On Fri, Apr 23, 2010 at 4:53 PM, Jeff Eastman >>>> <j...@windwardsolutions.com> wrote: >>>> >>>> >>>> >>>> >>>>> Section 4.5.1 says: >>>>> "The third line shows how it is based on item-item similarities, not >>>>> user-user similarities as before. The algorithms are similar, but not >>>>> entirely symmetric. They do have notably different properties. For >>>>> instance, >>>>> the running time of an item-based recommender scales up as the number >>>>> of >>>>> items increases, whereas a user-based recommender’s running time goes >>>>> up >>>>> as >>>>> the number of users increases. >>>>> >>>>> This suggests one reason that you might choose an item-based >>>>> recommender: >>>>> if >>>>> the number of users is relatively low compared to the number of items, >>>>> the >>>>> performance advantage could be significant." >>>>> >>>>> Shouldn't the second paragraph be? >>>>> >>>>> "This suggests one reason that you might choose an item-based >>>>> recommender: >>>>> if the number of users is relatively *high* compared to the number of >>>>> items, >>>>> the performance advantage could be significant." >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> >> >> > >