Re: what kind of improvement for HDFS could possibly be done within 3 months

Thomas Koch Thu, 30 Sep 2010 01:19:12 -0700

xiaofei du:
> Hi All,
> 
> I am a graduate student, I am preparing for my diploma project. I have
> about 3 months to finish the project. I want to do some work on HDFS.
> However, I have no concept what I could do for improving HDFS. So could you
> guys please give me some suggestions?
> 
> I hope the suggested project could be done within 3 months, I cannot afford
> more time. So the project should not be too hard (at the time, it should
> not be easy, otherwise, I cannot reach the graduation requirement :-) )
> 
> thank you !!!
Hi Xiaofei,


I've three other suggestions:

- Yesterday I got hit by MAPREDUCE-1283[1]. This issue by itself is of course 
not enough for three months, but my idea is that you could in general have a 
look over the developer tools and what's missing, what needs improvement.

- HBasene[2] is a project to store a lucene index natively on BigTable. It was 
inspired by Lucandra[3]. The HBasene project however has stalled. Still it 
would be a very promissing project IMHO especially considering the upcoming 
talk on Googles new Index infrastructure Percolator[4] that uses BigTable to 
store the index.

- A backup system on top of HBase. It should try to store similar files near 
to each other so that tablet compression can work best. HBase's timestamps 
could be used to hold several versions of a file and let HBase handle the 
expiration of old versions of files.
As an additional task you could evaluate the feasability of installing HBase 
as a backup system with office desktop computers as regionservers. This could 
utilize otherwise unused hard drive space.

[1] https://issues.apache.org/jira/browse/MAPREDUCE-1283
[2] http://github.com/akkumar/hbasene
[3] http://github.com/tjake/Lucandra
[4] http://www.theregister.co.uk/2010/09/24/google_percolator/

Hope, I could help,

Thomas Koch, http://www.koch.ro

Re: what kind of improvement for HDFS could possibly be done within 3 months

Reply via email to