Hello All, I working on a P2P storage project for research purpose. I want to use HDFS DataNode as a part of a research project. One possibility is using only DataNode as a storage engine and do everything else at upper level. In this case I will have all the metadata management and replication mechanism at upper level and use DataNode only for storing data per node.
The second possibility is using also NameNode for metadata management and modify it to fit in my project. I have been trying to find where to start. How much modularity is there in HDFS? Can I use only DataNode alone and modify it to fit in my project? What are inputs and outputs of DataNode? Where should I start? If I decide to use also NameNode, where should start? Any comment/help is appreciated. Thanks Yasin Celik