xiaofei du: > Hi All, > > I am a graduate student, I am preparing for my diploma project. I have > about 3 months to finish the project. I want to do some work on HDFS. > However, I have no concept what I could do for improving HDFS. So could you > guys please give me some suggestions? > > I hope the suggested project could be done within 3 months, I cannot afford > more time. So the project should not be too hard (at the time, it should > not be easy, otherwise, I cannot reach the graduation requirement :-) ) > > thank you !!! Hi Xiaofei,
I've three other suggestions: - Yesterday I got hit by MAPREDUCE-1283[1]. This issue by itself is of course not enough for three months, but my idea is that you could in general have a look over the developer tools and what's missing, what needs improvement. - HBasene[2] is a project to store a lucene index natively on BigTable. It was inspired by Lucandra[3]. The HBasene project however has stalled. Still it would be a very promissing project IMHO especially considering the upcoming talk on Googles new Index infrastructure Percolator[4] that uses BigTable to store the index. - A backup system on top of HBase. It should try to store similar files near to each other so that tablet compression can work best. HBase's timestamps could be used to hold several versions of a file and let HBase handle the expiration of old versions of files. As an additional task you could evaluate the feasability of installing HBase as a backup system with office desktop computers as regionservers. This could utilize otherwise unused hard drive space. [1] https://issues.apache.org/jira/browse/MAPREDUCE-1283 [2] http://github.com/akkumar/hbasene [3] http://github.com/tjake/Lucandra [4] http://www.theregister.co.uk/2010/09/24/google_percolator/ Hope, I could help, Thomas Koch, http://www.koch.ro