On Fri, Mar 21, 2008 at 12:42 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > Rong-en Fan wrote: > > I have two questions regarding the mapfile in hadoop/hdfs. First, when > using > > MapFileOutputFormat as reducer's output, is there any way to change > > the index interval (i.e., able to call setIndexInterval() on the > > output MapFile)? > > Not at present. It would probably be good to change MapFile to get this > value from the Configuration. A static method could be added, > MapFile#setIndexInterval(Configuration conf, int interval), that sets > "io.mapfile.index.interval", and the MapFile constructor could read this > property from the Configuration. One could then use the static method > to set this on jobs. > > If you need this, please file an issue in Jira. If possible, include a > patch too. > > http://wiki.apache.org/hadoop/HowToContribute
Thanks, I will consider this. > > Second, is it possible to tell what is the position in data file for a > given > > key, assuming index interval is 1 and # of keys are small? > > One could read the "index" file explicitly. It's just a SequenceFile, > listing keys and positions in the "data" file. But why would you set > the index interval to 1? And why do you need to know the position? I want to move my computation to the datanode that has my data. As there are some overheads of launching map-reduce job, I want to run a persistent daemon on each datanode to do my computation. Any suggestions? Regards, Rong-En Fan
