Hi all, During our GSoC meeting this morning, Isabel and I were discussing a data type flaw in my algorithm and its possible solutions: one involves rewriting the data types manually and likely adding another M/R task, and the other involves making use of a very recent Hadoop API commit (specifically, the mapreduce.lib.join.* library, committed only two months ago and not available in version 0.20.2).
Isabel mentioned that others may find the API available in the latest Hadoop commit to be useful to their code; however, since there hasn't been an official release after 0.20.2, it would require a bit more finesse to integrate the new API. Hence I pose this question to the list: does anyone else need or want to use what is effectively the beta version of Hadoop? If so, I will make use of it as well (which would greatly simplify and reduce the amount of code I have to rewrite); if not, I will write an intermediate task that does all the processing I need. Thanks for your input! Regards, Shannon
