Ok. Thanks. Looks like a dumb bug in the HFileOutputFormat. I'll check tomorrow. Thanks for your patience. St.Ack
On Sat, Nov 7, 2009 at 11:06 PM, Murali Krishna. P <[email protected]>wrote: > no, we are not dropping it. It is going to the previous region's last > entry. So,the last key in inclusive but firstkey is exclusive. > > look at my test code: > HFile.Reader reader = new HFile.Reader(fs, new > Path(args[0]), null, true); > reader.loadFileInfo(); > System.out.println("FirstKey:" + new > String(reader.getFirstKey())); > System.out.println("LastKey:" + new > String(reader.getLastKey())); > HFileScanner l = reader.getScanner(); > l.seekTo(reader.getLastKey()); > KeyValue t = l.getKeyValue(); > System.out.println("last key:" + t.getKeyString() + > " last value length:" + t.getValueLength() + " value:" + t.getValue()); > and output is: > > FirstKey:00000d7d4f36c112imagevalue������� > LastKey:333305184e0f7c3eimagevalue������� > last > key:\x00\x10333305184e0f7c3e\x05imagevalue\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x04 > last value length:3398 value:[...@8888e6c > > Thanks, > Murali Krishna > > > > > ________________________________ > From: stack <[email protected]> > To: [email protected] > Sent: Sun, 8 November, 2009 12:21:48 PM > Subject: Re: Issue with bulk loader tool > > So, do you think we are dropping the first key in the region? > Thanks, > St.Ack > > On Sat, Nov 7, 2009 at 9:17 PM, Murali Krishna. P <[email protected] > >wrote: > > > No, the first key is 6666909d611e8d7e for the region which says startKey > is > > 666629fe4378c096. > > (this is actually the next key in the order). > > > > HFile -p:- > > Scanning -> /hbase/test12/336573097/image/2362265315474952099 > > K: \x00\x106666909d611e8d7e\x05imagevalue\x7F\x.. > > > > HFileUtil /hbase/test12/336573097/image/2362265315474952099 :- > > FirstKey:6666909d611e8d7eimagevalue������� > > LastKey:99998c8f356b0d86imagevalue������� > > > > But the scan .META. shows the start key as 666629fe4378c096. (attached > > .META.) > > > > This seems to be the case for all the regions. (the actual firstKey is > next > > one from claimed firstKey) > > > > I am on hadoop0.20.0 > > > > Thanks, > > Murali Krishna > > > > > > ------------------------------ > > *From:* stack <[email protected]> > > *To:* [email protected] > > *Sent:* Sun, 8 November, 2009 4:30:15 AM > > > > *Subject:* Re: Issue with bulk loader tool > > > > Its what Lars says Murali, a region's startkey is inclusive and its > endkey > > exclusive. If it exists, it should be in the region has it for a start > key > > (It will not be duplicated in both). > > > > For .META., there is usually only one Region instance in a .META. table. > > Its startkey will be the empty key so its not suprirising its first key > is > > different from the empty key. What do you see when you look at the > second > > region in your just uploaded table? I'd expect the key 666629fe4378c096 > to > > be first in the region whose startkey is 666629fe4378c096. > > > > Thanks for figuring MAPREDUCE-565 could trip us up. Your hadoop is not > > 0.20.1? > > > > Yours, > > St.Ack > > > > > > > > On Sat, Nov 7, 2009 at 7:58 AM, Murali Krishna. P < > [email protected] > > >wrote: > > > > > Thanks Lars for the clarification, > > > But where does the record recide ? Is it duplicated to both the > > regions > > > ?? When I use HFile.Reader, the first key in the second region is > > different. > > > May be this behaviour(overlap) is only in .META. ? > > > The issue is that when I request for that boundary record, it is > > loging > > > the next region. > > > > > > 09/11/07 07:52:05 DEBUG client.HConnectionManager$TableServers: Cached > > > location address: 76.13.20.58:60020, regioninfo: REGION => {NAME => > > > '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE > > => > > > {{NAME => '.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', > > > FAMILIES => [{NAME => 'historian', VERSIONS => '2147483647', > COMPRESSION > > => > > > 'NONE', TTL => '604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', > > > BLOCKCACHE => 'false'}, {NAME => 'info', VERSIONS => '10', COMPRESSION > => > > > 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'false', > > > BLOCKCACHE => 'false'}]}} > > > 09/11/07 07:52:05 DEBUG client.HConnectionManager$TableServers: Cached > > > location address: 76.13.20.114:60020, regioninfo: REGION => {NAME => > > > 'test12,333305184e0f7c3e,1257515988652', STARTKEY => > '333305184e0f7c3e', > > > ENDKEY => '666629fe4378c096', ENCODED => 170637321, TABLE => {{NAME => > > > 'test12', FAMILIES => [{NAME => 'image', VERSIONS => '3', COMPRESSION > => > > > 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => > 'false', > > > BLOCKCACHE => 'true'}]}} > > > > > > Thanks, > > > Murali Krishna > > > > > > > > > > > > > > > ________________________________ > > > From: Lars George <[email protected]> > > > To: "[email protected]" <[email protected]> > > > Sent: Sat, 7 November, 2009 9:19:37 PM > > > Subject: Re: Issue with bulk loader tool > > > > > > Hi Murali, > > > > > > What you see is normal the last keys do indeed overlap. The last key of > a > > > region is exclusive and marks the first key of the subsequent region. > > > > > > Lars > > > > > > On Nov 7, 2009, at 9:05, "Murali Krishna. P" <[email protected]> > > > wrote: > > > > > > > Hi, > > > > I got it resolved. > https://issues.apache.org/jira/browse/HADOOP-5750was > > > causing this, even though I supplied a custom total ordering > partitioner, > > it > > > didnt use that. > > > > > > > > > > > > Now the regions looks properly sorted, but facing a new issue. The > > last > > > key of the each region is not retrievable. The table.jsp page shows > the > > > start and end key wrongly. > > > > for eg, take first 2 regions > > > > region1: start : end: 333305184e0f7c3e > > > > region2: start: 333305184e0f7c3e end: 666629fe4378c096 > > > > > > > > The end key of first region = start key of second ?? > > > > > > > > If I get the first and last key using HFile.Reader, it shows as > > follows: > > > > > > > > HFileUtil /hbase/test12/98766318/image/9052388247118781160 > > > > FirstKey:00000d7d4f36c112imagevalue������� > > > > LastKey:333305184e0f7c3eimagevalue������� > > > > > > > > HFileUtil /hbase/test12/170637321/image/7602871928600243730 > > > > FirstKey:33338d45cc2491b8imagevalue������� > > > > LastKey:666629fe4378c096imagevalue������� > > > > > > > > So, according to this first key of 2nd region is 33338d45cc2491b8 not > > > 333305184e0f7c3e which is correct! > > > > > > > > Now when I do a get on 333305184e0f7c3e with debug on, it is loading > > the > > > second region which is wrong! > > > > > > > > Some thing went wrong with the index? > > > > > > > > Thanks, > > > > Murali Krishna > > > > > > > > > > > > > > > > > > > > ________________________________ > > > > From: stack <[email protected]> > > > > To: [email protected] > > > > Sent: Sat, 7 November, 2009 6:26:03 AM > > > > Subject: Re: Issue with bulk loader tool > > > > > > > > On Fri, Nov 6, 2009 at 12:58 AM, Murali Krishna. P > > > > <[email protected]>wrote: > > > > > > > >> Hi, > > > >> If I increase hbase.hregion.max.filesize so that all the records > holds > > > in > > > >> one region (and one reducer ), all the records as retrievable. If > one > > > >> reducer creates multiple hfile or multiple reducer creates one hfile > > > each, > > > >> the problem occurs. > > > >> > > > >> > > > > > > > > Multiple hfiles in a region? Or are you saying if a reducer creates > > > > multiple regions? There is supposed to be one file per region only > > when > > > > done. > > > > > > > > Thanks for digging in, > > > > St.Ack > > > > > > > > > > > > > > > > > > > >> Does that give any clue? > > > >> > > > >> Thanks, > > > >> Murali Krishna > > > >> > > > >> > > > >> > > > >> > > > >> ________________________________ > > > >> From: Murali Krishna. P <[email protected]> > > > >> To: [email protected] > > > >> Sent: Thu, 5 November, 2009 6:34:20 PM > > > >> Subject: Re: Issue with bulk loader tool > > > >> > > > >> Hi Stack, > > > >> Sorry, could not look into this last week... > > > >> > > > >> I got problem with the Htable interface as well. Some records i am > not > > > >> retrieve from Htable as well. > > > >> I lost the old table, but reproduced the problem with a different > > table. > > > >> > > > >> I cannot send the region since it is very huge. will try to give as > > much > > > >> info as possible here :) > > > >> > > > >> There are total 5 regions as below in that table: > > > >> Name > > > >> > > > >> Encoded Name > > > >> Start Key > > > >> End Key > > > >> test1,,1257414794600 > > > >> 106817540 > > > >> fffe9c7f87c8332a > > > >> test1,fffe9c7f87c8332a,1257414794616 > > > >> 1346846599 fffe9c7f87c8332a fffebe279c0ac4d2 > > > >> test1,fffebe279c0ac4d2,1257414794628 > > > >> 1835851728 fffebe279c0ac4d2 fffec418284d6fbc > > > >> test1,fffec418284d6fbc,1257414794637 > > > >> 1078205908 fffec418284d6fbc fffef7a12ea22498 > > > >> test1,fffef7a12ea22498,1257414794647 > > > >> 1515378663 fffef7a12ea22498 > > > >> > > > >> I am looking for a key, say 000011d1bc8cd6fe . This should be in the > > > first > > > >> region ? > > > >> > > > >> using hfile tool, > > > >> org.apache.hadoop.hbase.io.hfile.HFile -k -f > > > >> /hbase/test1/106817540/image/3828859735461759684 -v -m -p | grep > > > >> 000011d1bc8cd6fe > > > >> The first region doesn't have it. Not sure what happened to that > > record. > > > >> > > > >> For a working key, it gives the record properly as below > > > >> K: > > > >> > > > > > > \x00\x100003bdd08ca88ee2\x05imagevalue\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x04 > > > >> V: \xFF... > > > >> > > > >> Please let me know if you need more information > > > >> > > > >> Thanks, > > > >> Murali Krishna > > > >> > > > >> > > > >> > > > >> > > > >> ________________________________ > > > >> From: stack <[email protected]> > > > >> To: [email protected] > > > >> Sent: Mon, 2 November, 2009 11:05:43 PM > > > >> Subject: Re: Issue with bulk loader tool > > > >> > > > >> Murali: > > > >> > > > >> Any developments worth mentioning? > > > >> > > > >> St.Ack > > > >> > > > >> > > > >> On Fri, Oct 30, 2009 at 10:14 AM, stack <[email protected]> wrote: > > > >> > > > >>> That is interesting. It'd almost point to a shell issue. Enable > > DEBUG > > > >> so > > > >>> client can see it. Then rerun shell. Is it at least loading the > > right > > > >>> region? (The regions start and end keys span the asked for key?). > I > > > >> took a > > > >>> look at your attached .META. scan. All looks good there. The > region > > > >>> specifications look right. If you want to bundle up the region > that > > is > > > >>> failing -- the one that the failing key comes out of, I can take a > > look > > > >>> here. You could also try playing with the HFile tool: ./bin/hbase > > > >>> org.apache.hadoop.hbase.io.hfile.HFile. Run the former and it'll > > > output > > > >>> usage. You should be able to get it to dump content of the region > > (You > > > >> need > > > >>> to supply flags like -v to see actual keys to the HFile tool else > it > > > just > > > >>> runs its check silently). Check for your key. Check things like > > > >>> timestamp on it. Maybe its 100 years in advance of now or > something? > > > >>> > > > >>> Yours, > > > >>> St.Ack > > > >>> > > > >>> > > > >>> On Fri, Oct 30, 2009 at 9:01 AM, Murali Krishna. P < > > > >> [email protected] > > > >>>> wrote: > > > >>> > > > >>>> Attached ".META" > > > >>>> > > > >>>> Interesting, I was able to get the row from HTable via java code. > > But > > > >> from > > > >>>> the shell, still getting following > > > >>>> > > > >>>> hbase(main):004:0> get 'TestTable2', 'ffffef95bcbf2638' > > > >>>> 0 row(s) in 1.2250 seconds > > > >>>> > > > >>>> Thanks, > > > >>>> Murali Krishna > > > >>>> > > > >>>> Thanks, > > > >>>> Murali Krishna > > > >>>> > > > >>>> > > > >>>> ------------------------------ > > > >>>> *From:* stack <[email protected]> > > > >>>> *To:* [email protected] > > > >>>> *Sent:* Fri, 30 October, 2009 8:39:46 PM > > > >>>> *Subject:* Re: Issue with bulk loader tool > > > >>>> > > > >>>> Can you send a listing of ".META."? > > > >>>> > > > >>>> hbase> scan ".META." > > > >>>> > > > >>>> Also, can you bring a region down from hdfs, tar and gzip it, and > > then > > > >> put > > > >>>> it someplace I can pull so I can take a look? > > > >>>> > > > >>>> Thanks, > > > >>>> St.Ack > > > >>>> > > > >>>> > > > >>>> On Fri, Oct 30, 2009 at 3:31 AM, Murali Krishna. P > > > >>>> <[email protected]>wrote: > > > >>>> > > > >>>>> Hi guys, > > > >>>>> I created a table according to hbase-48. A mapreduce job which > > > >> creates > > > >>>>> HFiles and then used loadtable.rb script to create the table. > > > >> Everything > > > >>>>> worked fine and i was able to scan the table. But when i do a get > > for > > > >> a > > > >>>> key > > > >>>>> displayed in the scan output, it is not retrieving the row. shell > > > says > > > >> 0 > > > >>>>> row. > > > >>>>> > > > >>>>> I tried using one reducer to ensure total ordering, but still > same > > > >>>> issue. > > > >>>>> > > > >>>>> > > > >>>>> My mapper is like: > > > >>>>> context.write(new > > > >>>>> ImmutableBytesWritable(((Text)key).toString().getBytes()), new > > > >>>>> KeyValue(((Text)key).toString().getBytes(), "family1".getBytes(), > > > >>>>> "column1".getBytes(), getValueBytes())); > > > >>>>> > > > >>>>> > > > >>>>> Please help me investigate this. > > > >>>>> > > > >>>>> Thanks, > > > >>>>> Murali Krishna > > > >>>>> > > > >>>> > > > >>> > > > >>> > > > >> > > > > > >
