Well Jake has a point, that some of these "new" 0.21+ APIs that you (and I) would like to use are actually already in Hadoop but in the old deprecated .mapred. APIs. And if we're of a mind to not move past 0.20.x, well, it doesn't 100% mean we can't use these functions.
It does mean using perhaps some deprecated code which is ugly. It's not so so ugly since these APIs are not only coming back in 0.21, but, are even un-deprecated in their old form in later versions. Confusing. I personally would support an "exception" for implementations that use old APIs for this reason. Keep in mind it is unfortunately hard to mix-n-match APIs. You'll probably have to use all .mapred. stuff if you use any. It's also possible to rewrite all this to not use the MultipleInputs stuff on the newer APIs with some of the kinds of techniques Ted and I mentioned. It'd also be valid to go this way, but, I imagine will be slower, perhaps a lot. And I think that would be good reason to not go this way. I think I (and likely others) trust your judgment to do what's best here as you're actively working on this. But if you mean you really do want advice on what's better ... erm ask Jake? On Sun, May 22, 2011 at 10:32 PM, Shannon Quinn <[email protected]> wrote: > But what it sounds like you're saying - cleverness with the keys and > organization of the input paths has the possibility to keep the process as a > single job (rather than 3), it'll just be a little "hacky" until some point > down the road when we switch some later version (0.21? 0.22?) of the API. > Ultimately, though, we'll have eliminated the 0.18 mapred.* libraries. > > Is this what you're getting at? > > On 5/22/2011 4:49 PM, Sean Owen wrote: >> >> Ah righty -- this exists in the old API doesn't it... even in 0.20.x >> But it's deprecated. But it's not deprecated in 0.21+. >> >> Yes I think there's a strong argument to make use of that even if it >> is deprecated. >> >> I had in mind the unnecessary use of old .mapred. APIs for simple >> Mappers and Reducers. >> >> On Sun, May 22, 2011 at 9:41 PM, Jake Mannix<[email protected]> >> wrote: >>> >>> Wait, are you saying that we should force things like matrix >>> multiplication >>> to become a 3-job process, instead of the current 1-job process? >>> >>> I thought we've already discussed and decided that moving to 0.20 APIs >>> where possible should be done, but where it removes functionality and >>> efficiency, we would allow the old API? >>> >>> -jake >>> > >
