May I suggest keeping constants in a public String value. That way people will not hard code clsuters-0 and so on and instead use Clusterer.CLUSTER_DIR
On Fri, Apr 23, 2010 at 11:55 PM, Jeff Eastman <j...@windwardsolutions.com>wrote: > My main goal for reworking the file nomenclature was to make the various > clustering file names follow a consistent naming convention. I don't expect > that to change again any time soon but I noticed that some of the examples > need to be updated to work with trunk (0.4). > > > > On 4/23/10 11:11 AM, Robin Anil wrote: > >> If you are making more changes do that, you are more than welcome to. Just >> fix a convention. For example, in the clustering algorithms chapter, it >> was >> points and clusters-[0-n] like you said. and in dirichlet it was state-n. >> So >> it will be better if we stick to a single convention and the book will >> follow(shouldn't be the other way around) >> >> Robin >> >> On Fri, Apr 23, 2010 at 11:30 PM, Jeff Eastman >> <j...@windwardsolutions.com>wrote: >> >> >> >>> The APIs did not change but the clustered points directory changed from >>> "points" to "clusteredPoints" and the various clusters directories >>> changed >>> from (e.g. canopies, clusters, clusters-n, canopies-n, state-n) to just >>> clusters-n, where clusters-0 is used for the initial clusters needed for >>> kmeans and is produced by canopy output by default. >>> >>> >>> On 4/23/10 10:25 AM, Robin Anil wrote: >>> >>> >>> >>>> Its not aimed at 0.3 per say. Right now its evolving with the code. For. >>>> eg. >>>> the quality factor is something that will go in there. I keep updating >>>> the >>>> code with the latest changes and so does Sean. There isnt much that got >>>> affected by your latest commit though(it compiles). Though I haven't >>>> fully >>>> tested the code with the dataset after the commit, something I plan to >>>> do >>>> soon. >>>> >>>> Robin >>>> >>>> On Fri, Apr 23, 2010 at 9:51 PM, Jeff Eastman< >>>> j...@windwardsolutions.com >>>> >>>> >>>>> wrote: >>>>> >>>>> >>>> >>>> >>>> >>>> >>>>> I also wonder how much my recent clustering changes have affected the >>>>> examples in the clustering sections. I know the book is currently aimed >>>>> at >>>>> Mahout 0.3 but users trying the examples with trunk may be frustrated >>>>> by >>>>> the >>>>> recent changes in file naming. Do the examples exist in an unannotated >>>>> version somewhere that I could get working again on trunk? >>>>> >>>>> On 4/23/10 9:10 AM, Sean Owen wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Good eye, this was fixed in the manuscript a while ago. >>>>>> >>>>>> I will ping Manning to re-publish Chapters 1-6 since a lot of small >>>>>> updates have happened since then. >>>>>> >>>>>> On Fri, Apr 23, 2010 at 4:53 PM, Jeff Eastman >>>>>> <j...@windwardsolutions.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Section 4.5.1 says: >>>>>>> "The third line shows how it is based on item-item similarities, not >>>>>>> user-user similarities as before. The algorithms are similar, but not >>>>>>> entirely symmetric. They do have notably different properties. For >>>>>>> instance, >>>>>>> the running time of an item-based recommender scales up as the number >>>>>>> of >>>>>>> items increases, whereas a user-based recommender’s running time goes >>>>>>> up >>>>>>> as >>>>>>> the number of users increases. >>>>>>> >>>>>>> This suggests one reason that you might choose an item-based >>>>>>> recommender: >>>>>>> if >>>>>>> the number of users is relatively low compared to the number of >>>>>>> items, >>>>>>> the >>>>>>> performance advantage could be significant." >>>>>>> >>>>>>> Shouldn't the second paragraph be? >>>>>>> >>>>>>> "This suggests one reason that you might choose an item-based >>>>>>> recommender: >>>>>>> if the number of users is relatively *high* compared to the number of >>>>>>> items, >>>>>>> the performance advantage could be significant." >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> >> >> > >