Re: MapFile and MapFileOutputFormat

Rong-en Fan Fri, 21 Mar 2008 00:10:12 -0700

On Fri, Mar 21, 2008 at 12:42 AM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Rong-en Fan wrote:
>  > I have two questions regarding the mapfile in hadoop/hdfs. First, when 
> using
>  > MapFileOutputFormat as reducer's output, is there any way to change
>  > the index interval (i.e., able to call setIndexInterval() on the
>  > output MapFile)?
>
>  Not at present.  It would probably be good to change MapFile to get this
>  value from the Configuration.  A static method could be added,
>  MapFile#setIndexInterval(Configuration conf, int interval), that sets
>  "io.mapfile.index.interval", and the MapFile constructor could read this
>  property from the Configuration.  One could then use the static method
>  to set this on jobs.
>
>  If you need this, please file an issue in Jira.  If possible, include a
>  patch too.
>
>  http://wiki.apache.org/hadoop/HowToContribute


Thanks, I will consider this.

>  > Second, is it possible to tell what is the position in data file for a 
> given
>  > key, assuming index interval is 1 and # of keys are small?
>
>  One could read the "index" file explicitly.  It's just a SequenceFile,
>  listing keys and positions in the "data" file.  But why would you set
>  the index interval to 1?  And why do you need to know the position?

I want to move my computation to the datanode that has my data.
As there are some overheads of launching map-reduce job, I want to
run a persistent daemon on each datanode to do my computation.
Any suggestions?

Regards,
Rong-En Fan

Re: MapFile and MapFileOutputFormat

Reply via email to