Thanks Stack, My guess is that loadtable script assigning the first key wrongly, not an issue with HFileOuputFormat. May the first key should be 33338d45cc2491b8 not 333305184e0f7c3e Anyway, will wait for your analysis tomorrow
Thanks for the support, Murali Krishna ________________________________ From: stack <[email protected]> To: [email protected] Sent: Sun, 8 November, 2009 12:47:25 PM Subject: Re: Issue with bulk loader tool Ok. Thanks. Looks like a dumb bug in the HFileOutputFormat. I'll check tomorrow. Thanks for your patience. St.Ack On Sat, Nov 7, 2009 at 11:06 PM, Murali Krishna. P <[email protected]>wrote: > no, we are not dropping it. It is going to the previous region's last > entry. So,the last key in inclusive but firstkey is exclusive. > > look at my test code: > HFile.Reader reader = new HFile.Reader(fs, new > Path(args[0]), null, true); > reader.loadFileInfo(); > System.out.println("FirstKey:" + new > String(reader.getFirstKey())); > System.out.println("LastKey:" + new > String(reader.getLastKey())); > HFileScanner l = reader.getScanner(); > l.seekTo(reader.getLastKey()); > KeyValue t = l.getKeyValue(); > System.out.println("last key:" + t.getKeyString() + > " last value length:" + t.getValueLength() + " value:" + t.getValue()); > and output is: > > FirstKey:00000d7d4f36c112imagevalue������� > LastKey:333305184e0f7c3eimagevalue������� > last > key:\x00\x10333305184e0f7c3e\x05imagevalue\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x04 > last value length:3398 value:[...@8888e6c > > Thanks, > Murali Krishna > > > > > ________________________________ > From: stack <[email protected]> > To: [email protected] > Sent: Sun, 8 November, 2009 12:21:48 PM > Subject: Re: Issue with bulk loader tool > > So, do you think we are dropping the first key in the region? > Thanks, > St.Ack > > On Sat, Nov 7, 2009 at 9:17 PM, Murali Krishna. P <[email protected] > >wrote: > > > No, the first key is 6666909d611e8d7e for the region which says startKey > is > > 666629fe4378c096. > > (this is actually the next key in the order). > > > > HFile -p:- > > Scanning -> /hbase/test12/336573097/image/2362265315474952099 > > K: \x00\x106666909d611e8d7e\x05imagevalue\x7F\x.. > > > > HFileUtil /hbase/test12/336573097/image/2362265315474952099 :- > > FirstKey:6666909d611e8d7eimagevalue������� > > LastKey:99998c8f356b0d86imagevalue������� > > > > But the scan .META. shows the start key as 666629fe4378c096. (attached > > .META.) > > > > This seems to be the case for all the regions. (the actual firstKey is > next > > one from claimed firstKey) > > > > I am on hadoop0.20.0 > > > > Thanks, > > Murali Krishna > > > > > > ------------------------------ > > *From:* stack <[email protected]> > > *To:* [email protected] > > *Sent:* Sun, 8 November, 2009 4:30:15 AM > > > > *Subject:* Re: Issue with bulk loader tool > > > > Its what Lars says Murali, a region's startkey is inclusive and its > endkey > > exclusive. If it exists, it should be in the region has it for a start > key > > (It will not be duplicated in both). > > > > For .META., there is usually only one Region instance in a .META. table. > > Its startkey will be the empty key so its not suprirising its first key > is > > different from the empty key. What do you see when you look at the > second > > region in your just uploaded table? I'd expect the key 666629fe4378c096 > to > > be first in the region whose startkey is 666629fe4378c096. > > > > Thanks for figuring MAPREDUCE-565 could trip us up. Your hadoop is not > > 0.20.1? > > > > Yours, > > St.Ack > > > > > > > > On Sat, Nov 7, 2009 at 7:58 AM, Murali Krishna. P < > [email protected] > > >wrote: > > > > > Thanks Lars for the clarification, > > > But where does the record recide ? Is it duplicated to both the > > regions > > > ?? When I use HFile.Reader, the first key in the second region is > > different. > > > May be this behaviour(overlap) is only in .META. ? > > > The issue is that when I request for that boundary record, it is > > loging > > > the next region. > > > > > > 09/11/07 07:52:05 DEBUG client.HConnectionManager$TableServers: Cached > > > location address: 76.13.20.58:60020, regioninfo: REGION => {NAME => > > > '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE > > => > > > {{NAME => '.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', > > > FAMILIES => [{NAME => 'historian', VERSIONS => '2147483647', > COMPRESSION > > => > > > 'NONE', TTL => '604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', > > > BLOCKCACHE => 'false'}, {NAME => 'info', VERSIONS => '10', COMPRESSION > => > > > 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'false', > > > BLOCKCACHE => 'false'}]}} > > > 09/11/07 07:52:05 DEBUG client.HConnectionManager$TableServers: Cached > > > location address: 76.13.20.114:60020, regioninfo: REGION => {NAME => > > > 'test12,333305184e0f7c3e,1257515988652', STARTKEY => > '333305184e0f7c3e', > > > ENDKEY => '666629fe4378c096', ENCODED => 170637321, TABLE => {{NAME => > > > 'test12', FAMILIES => [{NAME => 'image', VERSIONS => '3', COMPRESSION > => > > > 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => > 'false', > > > BLOCKCACHE => 'true'}]}} > > > > > > Thanks, > > > Murali Krishna > > > > > > > > > > > > > > > ________________________________ > > > From: Lars George <[email protected]> > > > To: "[email protected]" <[email protected]> > > > Sent: Sat, 7 November, 2009 9:19:37 PM > > > Subject: Re: Issue with bulk loader tool > > > > > > Hi Murali, > > > > > > What you see is normal the last keys do indeed overlap. The last key of > a > > > region is exclusive and marks the first key of the subsequent region. > > > > > > Lars > > > > > > On Nov 7, 2009, at 9:05, "Murali Krishna. P" <[email protected]> > > > wrote: > > > > > > > Hi, > > > > I got it resolved. > https://issues.apache.org/jira/browse/HADOOP-5750was > > > causing this, even though I supplied a custom total ordering > partitioner, > > it > > > didnt use that. > > > > > > > > > > > > Now the regions looks properly sorted, but facing a new issue. The > > last > > > key of the each region is not retrievable. The table.jsp page shows > the > > > start and end key wrongly. > > > > for eg, take first 2 regions > > > > region1: start : end: 333305184e0f7c3e > > > > region2: start: 333305184e0f7c3e end: 666629fe4378c096 > > > > > > > > The end key of first region = start key of second ?? > > > > > > > > If I get the first and last key using HFile.Reader, it shows as > > follows: > > > > > > > > HFileUtil /hbase/test12/98766318/image/9052388247118781160 > > > > FirstKey:00000d7d4f36c112imagevalue������� > > > > LastKey:333305184e0f7c3eimagevalue������� > > > > > > > > HFileUtil /hbase/test12/170637321/image/7602871928600243730 > > > > FirstKey:33338d45cc2491b8imagevalue������� > > > > LastKey:666629fe4378c096imagevalue������� > > > > > > > > So, according to this first key of 2nd region is 33338d45cc2491b8 not > > > 333305184e0f7c3e which is correct! > > > > > > > > Now when I do a get on 333305184e0f7c3e with debug on, it is loading > > the > > > second region which is wrong! > > > > > > > > Some thing went wrong with the index? > > > > > > > > Thanks, > > > > Murali Krishna > > > > > > > > > > > > > > > > > > > > ________________________________ > > > > From: stack <[email protected]> > > > > To: [email protected] > > > > Sent: Sat, 7 November, 2009 6:26:03 AM > > > > Subject: Re: Issue with bulk loader tool > > > > > > > > On Fri, Nov 6, 2009 at 12:58 AM, Murali Krishna. P > > > > <[email protected]>wrote: > > > > > > > >> Hi, > > > >> If I increase hbase.hregion.max.filesize so that all the records > holds > > > in > > > >> one region (and one reducer ), all the records as retrievable. If > one > > > >> reducer creates multiple hfile or multiple reducer creates one hfile > > > each, > > > >> the problem occurs. > > > >> > > > >> > > > > > > > > Multiple hfiles in a region? Or are you saying if a reducer creates > > > > multiple regions? There is supposed to be one file per region only > > when > > > > done. > > > > > > > > Thanks for digging in, > > > > St.Ack > > > > > > > > > > > > > > > > > > > >> Does that give any clue? > > > >> > > > >> Thanks, > > > >> Murali Krishna > > > >> > > > >> > > > >> > > > >> > > > >> ________________________________ > > > >> From: Murali Krishna. P <[email protected]> > > > >> To: [email protected] > > > >> Sent: Thu, 5 November, 2009 6:34:20 PM > > > >> Subject: Re: Issue with bulk loader tool > > > >> > > > >> Hi Stack, > > > >> Sorry, could not look into this last week... > > > >> > > > >> I got problem with the Htable interface as well. Some records i am > not > > > >> retrieve from Htable as well. > > > >> I lost the old table, but reproduced the problem with a different > > table. > > > >> > > > >> I cannot send the region since it is very huge. will try to give as > > much > > > >> info as possible here :) > > > >> > > > >> There are total 5 regions as below in that table: > > > >> Name > > > >> > > > >> Encoded Name > > > >> Start Key > > > >> End Key > > > >> test1,,1257414794600 > > > >> 106817540 > > > >> fffe9c7f87c8332a > > > >> test1,fffe9c7f87c8332a,1257414794616 > > > >> 1346846599 fffe9c7f87c8332a fffebe279c0ac4d2 > > > >> test1,fffebe279c0ac4d2,1257414794628 > > > >> 1835851728 fffebe279c0ac4d2 fffec418284d6fbc > > > >> test1,fffec418284d6fbc,1257414794637 > > > >> 1078205908 fffec418284d6fbc fffef7a12ea22498 > > > >> test1,fffef7a12ea22498,1257414794647 > > > >> 1515378663 fffef7a12ea22498 > > > >> > > > >> I am looking for a key, say 000011d1bc8cd6fe . This should be in the > > > first > > > >> region ? > > > >> > > > >> using hfile tool, > > > >> org.apache.hadoop.hbase.io.hfile.HFile -k -f > > > >> /hbase/test1/106817540/image/3828859735461759684 -v -m -p | grep > > > >> 000011d1bc8cd6fe > > > >> The first region doesn't have it. Not sure what happened to that > > record. > > > >> > > > >> For a working key, it gives the record properly as below > > > >> K: > > > >> > > > > > > \x00\x100003bdd08ca88ee2\x05imagevalue\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x04 > > > >> V: \xFF... > > > >> > > > >> Please let me know if you need more information > > > >> > > > >> Thanks, > > > >> Murali Krishna > > > >> > > > >> > > > >> > > > >> > > > >> ________________________________ > > > >> From: stack <[email protected]> > > > >> To: [email protected] > > > >> Sent: Mon, 2 November, 2009 11:05:43 PM > > > >> Subject: Re: Issue with bulk loader tool > > > >> > > > >> Murali: > > > >> > > > >> Any developments worth mentioning? > > > >> > > > >> St.Ack > > > >> > > > >> > > > >> On Fri, Oct 30, 2009 at 10:14 AM, stack <[email protected]> wrote: > > > >> > > > >>> That is interesting. It'd almost point to a shell issue. Enable > > DEBUG > > > >> so > > > >>> client can see it. Then rerun shell. Is it at least loading the > > right > > > >>> region? (The regions start and end keys span the asked for key?). > I > > > >> took a > > > >>> look at your attached .META. scan. All looks good there. The > region > > > >>> specifications look right. If you want to bundle up the region > that > > is > > > >>> failing -- the one that the failing key comes out of, I can take a > > look > > > >>> here. You could also try playing with the HFile tool: ./bin/hbase > > > >>> org.apache.hadoop.hbase.io.hfile.HFile. Run the former and it'll > > > output > > > >>> usage. You should be able to get it to dump content of the region > > (You > > > >> need > > > >>> to supply flags like -v to see actual keys to the HFile tool else > it > > > just > > > >>> runs its check silently). Check for your key. Check things like > > > >>> timestamp on it. Maybe its 100 years in advance of now or > something? > > > >>> > > > >>> Yours, > > > >>> St.Ack > > > >>> > > > >>> > > > >>> On Fri, Oct 30, 2009 at 9:01 AM, Murali Krishna. P < > > > >> [email protected] > > > >>>> wrote: > > > >>> > > > >>>> Attached ".META" > > > >>>> > > > >>>> Interesting, I was able to get the row from HTable via java code. > > But > > > >> from > > > >>>> the shell, still getting following > > > >>>> > > > >>>> hbase(main):004:0> get 'TestTable2', 'ffffef95bcbf2638' > > > >>>> 0 row(s) in 1.2250 seconds > > > >>>> > > > >>>> Thanks, > > > >>>> Murali Krishna > > > >>>> > > > >>>> Thanks, > > > >>>> Murali Krishna > > > >>>> > > > >>>> > > > >>>> ------------------------------ > > > >>>> *From:* stack <[email protected]> > > > >>>> *To:* [email protected] > > > >>>> *Sent:* Fri, 30 October, 2009 8:39:46 PM > > > >>>> *Subject:* Re: Issue with bulk loader tool > > > >>>> > > > >>>> Can you send a listing of ".META."? > > > >>>> > > > >>>> hbase> scan ".META." > > > >>>> > > > >>>> Also, can you bring a region down from hdfs, tar and gzip it, and > > then > > > >> put > > > >>>> it someplace I can pull so I can take a look? > > > >>>> > > > >>>> Thanks, > > > >>>> St.Ack > > > >>>> > > > >>>> > > > >>>> On Fri, Oct 30, 2009 at 3:31 AM, Murali Krishna. P > > > >>>> <[email protected]>wrote: > > > >>>> > > > >>>>> Hi guys, > > > >>>>> I created a table according to hbase-48. A mapreduce job which > > > >> creates > > > >>>>> HFiles and then used loadtable.rb script to create the table. > > > >> Everything > > > >>>>> worked fine and i was able to scan the table. But when i do a get > > for > > > >> a > > > >>>> key > > > >>>>> displayed in the scan output, it is not retrieving the row. shell > > > says > > > >> 0 > > > >>>>> row. > > > >>>>> > > > >>>>> I tried using one reducer to ensure total ordering, but still > same > > > >>>> issue. > > > >>>>> > > > >>>>> > > > >>>>> My mapper is like: > > > >>>>> context.write(new > > > >>>>> ImmutableBytesWritable(((Text)key).toString().getBytes()), new > > > >>>>> KeyValue(((Text)key).toString().getBytes(), "family1".getBytes(), > > > >>>>> "column1".getBytes(), getValueBytes())); > > > >>>>> > > > >>>>> > > > >>>>> Please help me investigate this. > > > >>>>> > > > >>>>> Thanks, > > > >>>>> Murali Krishna > > > >>>>> > > > >>>> > > > >>> > > > >>> > > > >> > > > > > >
