Oh, I have tried hbase in the early. But I think HDFS may give me a choice. Thanks.
On Thu, Oct 29, 2009 at 10:16 AM, Jeff Zhang <[email protected]> wrote: > I guess maybe HBase will be fit for you. HBase is a distributed database > built upon Hadoop. > You can use the url as the row key and put other fields into columns. > > then you can retrieve the web page through HBase Client API and insert new > web page into it. The performance of HBase 0.20 is good enough for you. > > Best Regards, > Jeff zhang > > > On Thu, Oct 29, 2009 at 8:53 AM, lei wang <[email protected]> > wrote: > > > hi,juff, thanks for your comments. > > I did read this book early, I use MapFile to store my web pages for > > random access. > > First I think the SquenceFile conversion as a solution, howerve, the > > problem is that I need append the new pages to the MapFile by minute > > or second, so I didn't think SquenceFile conversion can manage this. > > Would you give me some suggestion? Think your very much! > > > > Best wishes. > > > > On 10/28/09, Jeff Zhang <[email protected]> wrote: > > > I do not know why you need use MapFile, could you use SequenceFile > > instead ? > > > > > > The MapFile's advantage is its read performance, because it build index > > on > > > its keys. So its keys must be in order. > > > > > > If you really want to use MapFile, you can first write your data to > > > SequenceFile and then covert it to MapFile. > > > > > > About how to convert SequenceFile to MapFile: > > > 1. Sort the SequenceFile using sort in examples of hadoop > > > 2. create index for the output of the above step. then you get both of > > the > > > data file and index file > > > > > > > > > You an refer Tom Whilte's book "Hadoop definitive guide" for details > > about > > > how to convert SequenceFile into MapFile > > > > > > Jeff Zhang > > > > > > > > > > > > On Wed, Oct 28, 2009 at 4:47 PM, lei wang <[email protected]> > > wrote: > > > > > >> but now, "url" is not in order, must the key be intwritable ? should > it > > >> be > > >> comparable ? > > >> How to make sure them in order?sort it first? > > >> I just want to insert the pages for random acess by "url ". > > >> > > >> On Wed, Oct 28, 2009 at 4:26 PM, Jeff Zhang <[email protected]> wrote: > > >> > > >> > Hi Wang, > > >> > > > >> > The keys of MapFile should be in order, so when you add records into > > >> > MapFile, you should make sure you insert them in order > > >> > > > >> > Best Regards, > > >> > > > >> > Jeff Zhang > > >> > > > >> > > > >> > On Wed, Oct 28, 2009 at 4:14 PM, lei wang <[email protected] > > > > >> > wrote: > > >> > > > >> > > Hi, friends > > >> > > I need store the web pages(a huge one) in the MapFile of the > hadoop, > > >> > > So > > >> i > > >> > > did use the url as the key, and its type is "text", When writring > > the > > >> > > records into the mapfile, it give an error as "out of order", > which > > >> type > > >> > > should I choose to represent the key "url", can anyone give me > some > > >> > detail > > >> > > answer, thanks for you help. > > >> > > > > >> > > > >> > > > > > >
