xiaofei du: > Hi All, > > I am a graduate student, I am preparing for my diploma project. I have > about 3 months to finish the project. I want to do some work on HDFS. > However, I have no concept what I could do for improving HDFS. So could you > guys please give me some suggestions? > > I hope the suggested project could be done within 3 months, I cannot afford > more time. So the project should not be too hard (at the time, it should > not be easy, otherwise, I cannot reach the graduation requirement :-) ) > > thank you !!! Hi,
you could write a developer documentation of the inner workings of HDFS (+HBASE, +MAPREDUCE?) that could be understood by HDFS users. Additionally to the documentation of the current state, you could include: - Different strategies to make the NameNode distributed - The different Approaches to append - How does Security with Kerberos work? One of the challenges of such a work would be to make it as easy as possible for developers to understand some part of HDFS they're interested in. Another challenge is to choose a documentation format and workflow that would make it easy to keep this documentation current without much effort. A totally other project that I also consider important for Hadoop: Help Apache to implement an infrastructure based on GIT. This could help many projects in the long run. If you're interested in this, you should subscribe to infrastructure-...@apache.org and get in contact with Jukka Zitting <jukka.zitt...@gmail.com>. Best regards, Thomas Koch, http://www.koch.ro