MapFile's are sorted SequenceFiles. If you add new data, you have to merge
(ie, an existing MapFile and a new set of data), to create a consistent new
MapFile - MapFile indexes just keep track of where some keys are, to make it
possible to do a binary search in RAM to minimize the need to do disk I/O -
but, since only a subset of keys are kept, the raw data has to be sorted.

Since MapFiles are a specialization of SequenceFiles, you can read them with
SequenceFileInputFormat.

There are hybrid approaches, if you want to minimize how often you have to
re-merge your MapFile - depending on performance, you could keep "k"
MapFiles around, and search all "k" simultaneously. Then, you only have to
merge when you have more than "k" files.

If you actually just need to do MapReduce over large sets of data, skip the
MapFile entirely - you only need it if you're doing random lookups (and you
shouldn't be doing random lookups within your MapReduce, if at all
possible!).

On 11/12/06, Runping Qi <[EMAIL PROTECTED]> wrote:


HDFS is write-once only. That means, once you are done writing to a file,
you cannot append/update to it.

The normal way to update data is to create new files in the same directory
and treat the file in the directory as the same "logical" file. You may
also
want to periodically merge multiple files into a single file.

Runping


> -----Original Message-----
> From: 张茂森 [mailto:[EMAIL PROTECTED]
> Sent: Sunday, November 12, 2006 7:08 PM
> To: [email protected]
> Subject: How to use MapFile?
>
>  Hi all:
>
> Now I want to do some operations like 'update' or 'insert', which can
> describe like this:
>
> 1. I have a base dataset
>
> 2. Everyday I will get more data from other places, and then I want to
> update or insert these new data into my base dataset.
>
> 3. After I've read API Doc, I think MapFile is a good way to solve this
> problem. As far as I know, I only need to append my new data at the end
of
> base dataset, and update the index file of MapFile. I understand right?
>
> 4.  If I am right, I want to know how to do these operations using
MapFile.
>
> Firstly, I could only find MapFileOutputFormat and couldn't find
> MapFileInputFormat, so how to read the MapFile?
>
> Secondly, how to update the index and append the data? Do you have some
> experience or samples?
>
> Any suggestion would be appreciated.
>
> Thank you!





--
Bryan A. P. Pendleton
Ph: (877) geek-1-bp

Reply via email to