Oops didn't mean to post that here. From: David Capwell <[email protected]<mailto:[email protected]>> Date: Tue, 7 Feb 2012 23:50:59 -0800 To: "Yahoo! Inc." <[email protected]<mailto:[email protected]>>, "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Random Reads
So I saw that you posted http://twiki.corp.yahoo.com/view/Grid/ProjectHBaseRandomReadFramework on the apache mailing list and took a quick glance at it. Why do you attach the random reads to the output format? This doesn't seem right to me. You refer to "tableAlias" a few times in the doc. Can you explain why this is needed and how Hives Database and Table will not work? "getRandomReader(Configuration conf, String tableAlias, String keyFieldName)" keyFieldName? What is the point of this variable? How is it different from RandomReader.getRecord(Object)? OutputJobInfo is going away with 0.4, so why have a dependency here that is going to break in ... now? Can you show an example MR job and which phases you expect the methods to be called at?
