MapFile's are sorted SequenceFiles. If you add new data, you have to merge (ie, an existing MapFile and a new set of data), to create a consistent new MapFile - MapFile indexes just keep track of where some keys are, to make it possible to do a binary search in RAM to minimize the need to do disk I/O - but, since only a subset of keys are kept, the raw data has to be sorted.
Since MapFiles are a specialization of SequenceFiles, you can read them with SequenceFileInputFormat. There are hybrid approaches, if you want to minimize how often you have to re-merge your MapFile - depending on performance, you could keep "k" MapFiles around, and search all "k" simultaneously. Then, you only have to merge when you have more than "k" files. If you actually just need to do MapReduce over large sets of data, skip the MapFile entirely - you only need it if you're doing random lookups (and you shouldn't be doing random lookups within your MapReduce, if at all possible!). On 11/12/06, Runping Qi <[EMAIL PROTECTED]> wrote:
HDFS is write-once only. That means, once you are done writing to a file, you cannot append/update to it. The normal way to update data is to create new files in the same directory and treat the file in the directory as the same "logical" file. You may also want to periodically merge multiple files into a single file. Runping > -----Original Message----- > From: 张茂森 [mailto:[EMAIL PROTECTED] > Sent: Sunday, November 12, 2006 7:08 PM > To: [email protected] > Subject: How to use MapFile? > > Hi all: > > Now I want to do some operations like 'update' or 'insert', which can > describe like this: > > 1. I have a base dataset > > 2. Everyday I will get more data from other places, and then I want to > update or insert these new data into my base dataset. > > 3. After I've read API Doc, I think MapFile is a good way to solve this > problem. As far as I know, I only need to append my new data at the end of > base dataset, and update the index file of MapFile. I understand right? > > 4. If I am right, I want to know how to do these operations using MapFile. > > Firstly, I could only find MapFileOutputFormat and couldn't find > MapFileInputFormat, so how to read the MapFile? > > Secondly, how to update the index and append the data? Do you have some > experience or samples? > > Any suggestion would be appreciated. > > Thank you!
-- Bryan A. P. Pendleton Ph: (877) geek-1-bp
