Last year, Rini Kaushik and I authored a paper "GreenHDFS: Towards An Energy-Conserving, Storage-Efficient, Hybrid Hadoop Compute Cluster" at HotPower'10 (PDF here: http://www.usenix.org/event/hotpower10/tech/full_papers/Kaushik.pdf) that analyzed "hotness" of files based on real namenode audit logs at Yahoo production clusters.
Based on the scan-centric nature of hadoop applications, we focused on file hotness, not block hotness. - Milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any organization, past or present, the author might be affiliated with.) On 11/12/11 9:09 AM, "Bharath Ravi" <bharathra...@gmail.com> wrote: >Hi all, > >We're trying to perform some sort of monitoring on HDFS, that could detect >when a datanode or a data-block >is "hot". It would be useful to see patterns of popularity in live HDFS >deployments. > >Would anyone know if there are any publicly available statistics on data >access patterns that we could look at? > >Thanks a lot! >-- >Bharath Ravi