Team, I've started documenting the first example (clustering of synthetic control data) present in the quick start page. The TOC is as below
1. Introduction 2. Problem description 3. Pre-Prep (thinking of moving this to a separate page and reference it in all the example pages) 4. Perform Clustering 5. Read / Analyze Output Does this TOC look ok ? Can we standardize on some TOC so that all examples (current and future) will have structured information. If someone can glance through documentation @ https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+dataand let me know of any feedbacks that'd be gr8. It'll help me with documenting the other examples.. Few questions - 1. How does end user read the output data present in output directory of HDFS. I tried reading data from /output/clusteredPoints/part-m-00000 and it doesnt look readable (It doesnt have control data numbers that are readable) 2. How could someone validate the accuracy of the control data clusters that are being generated ? regards Joe.
