Hi all, I'm reading our old discussions about record converter, superstep injection, and common module:
- http://markmail.org/message/ol32pp2ixfazcxfc - http://markmail.org/message/xwtmfdrag34g5xc4 To clarify goals and objectives: 1. A parallel input partition is necessary for obtaining scalability and elasticity of a Bulk Synchronous Parallel processing (It's not a memory issue, or Disk/Spilling Queue, or HAMA-644. Please don't shake). 2. Input partitioning should be handled at BSP framework level, and it is for every Hama jobs, not only for Graph jobs. 3. Unnecessary I/O Overhead need to be avoided, and NoSQLs input also should be considered. The current problem is that every input of graph jobs should be rewritten on HDFS. If you have a good idea, Please let me know. -- Best Regards, Edward J. Yoon @eddieyoon
