May I see the dump. You can send it to me offline if you prefer. Send 'select * from .META.' rather than just the info:regioninfo column.

Its possible to have overlapping regions if both parent and its split daughters are present in the table. The parent, the overlapping region, wil be legitimately offline.

Do you have DEBUG enabled on your hbase cluster? If you do, would be interested in seeing your logs too to see if can figure how the scenario came about (We cleared up on such case recently that is in 0.1.0).

How many regionservers?

Thanks David,
St.Ack


David Alves wrote:
Hi Again St.Ack

        I ran your command and there was a strange output, in fact you were
right one of the regions is in fact offline but the strange part is that
there are two regions like this:
tableA,,1207585555760
tableA,,1207588561303
        Ant it is the first that is offline (marked as: offline: "true,
split: true,"). Isn't it correct that only the first region of a table
should have a "" first key (it is the first key that's next to the table
name right?).
        Regarding the M/R job, more precisely the error occurs in the
instantiation of the table object (one is instantiated for each M/R job) the
precise stack is:

Caused by: org.apache.hadoop.hbase.TableNotFoundException: Table 'tableA'
was not found.
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
nMeta(HConnectionManager.java:415)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
HConnectionManager.java:346)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
HConnectionManager.java:308)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:89)

Best Regards
David Alves

-----Original Message-----
From: stack [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 08, 2008 12:14 AM
To: [email protected]
Subject: Re: StackOverFlow Error in HBase

The below all sounds good (including the bit where we need to change TIF
so you can specify filters and so its subclassable).

Can you run the commands suggested below to elicit if indeed the online
table has an offlined region in its midst.

St.Ack


David Alves wrote:
Hi St.Ack

        Firstly this is a trunk (HEAD) version updated & patched today
(Monday) running against Hadoop trunk (HEAD) updated Friday.
        The M/R job (a crawler amongst other things) scans a table (lets say
table A) through a custom tableinputformat (because it requires filters)
but
that is only slightly altered version of the TableInputFormat class (btw
I
would suggest redesigning the class to allow for extension and would
gladly
help). In the end of the map phase and for each record the new
references
found are inserted onto table A (this is precisely where the job fails,
under load) and by the end of the reduce phase the processed records are
inserted onto table B.
        My question here is how should I cope with this kind of failure?

Best Regards
David Alves


-----Original Message-----
From: stack [mailto:[EMAIL PROTECTED]
Sent: Monday, April 07, 2008 7:21 PM
To: [email protected]
Subject: Re: StackOverFlow Error in HBase

I've seen TNFE when a region in the middle of an online table is
offline.  Shouldn't ever happen but....

What I've seen is in the shell, you can do 'show tables;' and it will
list all tables including the one reporting TNFE.

You then attempt a get or a scan against the table and you get the TNFE
exception.

Is this what you are seeing?

Try doing a 'select info:regioninfo from .META.;'  Look for a region
marked offline.  Might be easier if you run the query like this: % echo
'select info:regioninfo from .META.;' | ./bin/hbase --html &>
/tmp/query.out ... because then you can grep around in the
/tmp/query.out file.. or just send it to us off-list and we'll take a
look.

For sure this is 0.1.0?

Thanks,
St.Ack


David Alves wrote:

Hi all

        I think we can consider the test has passed, as previous error logs
told me the M/R job failed around 35,000 records and the job has
reached
42.000, failing for a whole other reason :

Caused by: org.apache.hadoop.hbase.TableNotFoundException: Table
'XXXXX'
was

not found.
        at


org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegio
nI

nMeta(HConnectionManager.java:415)
        at


org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegio
n(

HConnectionManager.java:346)
        at


org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegio
n(

HConnectionManager.java:308)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:89).

Is there a known workaround for this problem I know for sure the table
exists as it has been used in the previous 25 M/R jobs? Should I make
my
code wait&retry until the table is up again?


Regards
David Alves



-----Original Message-----
From: Jim Kellerman [mailto:[EMAIL PROTECTED]
Sent: Monday, April 07, 2008 5:09 PM
To: [email protected]
Subject: RE: StackOverFlow Error in HBase

Yes, trunk is fine since there are no changes in filters between 0.1

and

trunk.

---
Jim Kellerman, Senior Engineer; Powerset




-----Original Message-----
From: David Alves [mailto:[EMAIL PROTECTED]
Sent: Monday, April 07, 2008 8:44 AM
To: [email protected]
Subject: RE: StackOverFlow Error in HBase

Hi Jim
        The job I left running before the weekend had some
(other) problems, mainly about hadoop API change.
        Anyway I'm running it again right now and at first
glance its working (I'll know for sure in about 1 hour), on a
different note there was a problem with RegExpRowFilter where
if it received more that one conditional in the constructor
map it would filter out records it shouldn't, and that
problem is solved.
        As Friday before I got your response I had already
upgraded the cluster to the hadoop and hbase trunk versions
I'm currently testing with these versions instead of 0.1,
hope there is no problem there.
        I'll send another email soon.

Regards
David Alves

On Mon, 2008-04-07 at 08:31 -0700, Jim Kellerman wrote:


David,

Any luck running this patch either against head or against


the 0.1 branch?


Thanks.

---
Jim Kellerman, Senior Engineer; Powerset




-----Original Message-----
From: David Alves [mailto:[EMAIL PROTECTED]
Sent: Friday, April 04, 2008 10:05 AM
To: [email protected]
Subject: RE: StackOverFlow Error in HBase

Hi Jim

        Of course, my questions was regarding whether I


should use


HEAD or some branch or tag.
        Anyways I currently running Hbase HEAD patched against
Hadoop HEAD, I'll know if its ok soon.

Regards
David Alves
On Fri, 2008-04-04 at 09:18 -0700, Jim Kellerman wrote:


After applying the patch, you have to rebuild and


deploy on your


cluster, run your test that was failing and verify that it


now works.


See



http://hadoop.apache.org/hbase/docs/current/api/overview-summary.htm


l#


overview_description



---
Jim Kellerman, Senior Engineer; Powerset




-----Original Message-----
From: David Alves [mailto:[EMAIL PROTECTED]
Sent: Friday, April 04, 2008 6:29 AM
To: [email protected]
Subject: RE: StackOverFlow Error in HBase

Hi all again

        I've never used the patch system you guys use, so


in order


to test the patch submitted by Clint what do I have to do? I
mean I've updated HEAD and applied the patch, is this it?

Regards
David Alves



On Thu, 2008-04-03 at 10:18 -0700, Jim Kellerman wrote:


Thanks David. I'll add 554 as a blocker for 0.1.1

---
Jim Kellerman, Senior Engineer; Powerset




-----Original Message-----
From: David Alves [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 03, 2008 9:21 AM
To: [EMAIL PROTECTED]
Subject: RE: StackOverFlow Error in HBase

Hi Jim and all

        I'll commit to test the patch under the same


conditions as


it failed before, (with around 36000 records) but in this
precise moment I preparing my next development


iteration, which


means a lot


of meetings.
        By the end of the day tomorrow (friday) I


should have a


confirmation whether the patch worked (or not).

Regards
David Alves

On Thu, 2008-04-03 at 09:12 -0700, Jim Kellerman wrote:


David,

Have you had a chance to try this patch? We are about to


release hbase-0.1.1 and until we receive a confirmation in
HBASE-554 from another person who has tried it and


verifies that it


works, we cannot include it in this release. If


it is not in


this release, there will be a significant wait for it


to appear


in an hbase release. hbase-0.1.2 will not happen anytime
soon


unless there


are critical issues that arise that have not been


fixed in 0.1.1.


hbase-0.2.0 is also some time in the future. There are a


significant


number of issues to address before that release is ready.


Frankly, I'd like to see this patch in 0.1.1,


because it


is


an issue for people that use filters.


The alternative would be for Clint to supply a test case


that fails without the patch but passes with the patch.


We will hold up the release, but need a


commitment either


from David to test the patch or for Clint to


supply a test.


We need that commitment by the end of the day today
2008/04/03 along with an eta as to when it will


be completed.


---
Jim Kellerman, Senior Engineer; Powerset




-----Original Message-----
From: David Alves


[mailto:[EMAIL PROTECTED]


Sent: Tuesday, April 01, 2008 2:36 PM
To: [EMAIL PROTECTED]
Subject: RE: StackOverFlow Error in HBase

Hi

        I just deployed the unpatched version.
        Tomorrow I'll rebuild the system with


the patch


and try it


out.
        Thanks again.

Regards
David Alves



-----Original Message-----
From: Jim Kellerman [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 01, 2008 10:04 PM
To: [EMAIL PROTECTED]
Subject: RE: StackOverFlow Error in HBase

David,

Have you tried this patch and does it work for


you? If so


we'll include it
hbase-0.1.1

---
Jim Kellerman, Senior Engineer; Powerset




-----Original Message-----
From: David Alves


[mailto:[EMAIL PROTECTED]


Sent: Tuesday, April 01, 2008 10:44 AM
To: [EMAIL PROTECTED]
Subject: RE: StackOverFlow Error in HBase

Hi
        Thanks for the prompt path Clint,


St.Ack and


all you guys.


Regards
David Alves



-----Original Message-----
From: [EMAIL PROTECTED]


[mailto:[EMAIL PROTECTED]


On Behalf


Of Clint Morgan
Sent: Tuesday, April 01, 2008 2:04 AM
To: [EMAIL PROTECTED]
Subject: Re: StackOverFlow Error in HBase

Try the patch at


https://issues.apache.org/jira/browse/HBASE-554.


cheers,
-clint

On Mon, Mar 31, 2008 at 5:39 AM, David Alves
<[EMAIL PROTECTED]> wrote:


Hi ... again

        In my previous mail I stated that


increasing the


stack size


solved the


 problem, well I jumped a little bit to the


conclusion,


in fact it


didn't, the StackOverFlowError always occurs


at the end


of the cycle


when no more records match the filter. Anyway
I've


rewritten my


application to use a normal scanner


and and do


the


"filtering" after


which is not optimal but it works.
        I'm just saying this because it might


be a clue,


in previous


versions


 (!= 0.1.0) even though a more serious


problem happened


(regionservers  became irresponsive after so
many


records) this


didn't happen. Btw in  current version I


notice no,


or


very small,


decrease of thoughput with  time, great work!

 Regards
 David Alves







 On Mon, 2008-03-31 at 05:18 +0100, David


Alves wrote:


 > Hi again
 >
 >       As I was almost at the end (80%)


of indexable


docs, for the


time


 > being I simply increased the stack size,
which


seemed to work.


 >       Thanks for your input St.Ack


really helped me


solve the problem


at


 > least for the moment.
 >       On another note in the same


method I changed


the way the


scanner was


 > obtained when htable.getStartKeys()


would be more


than


1, so that


I


could


 > limit the records read each time


to a single


region, and the


scanning


would


 > start at the last region, strangely


the number of


keys


obtained


by  > htable.getStartKeys() was always 1


even though


by the end


there are


already


 > 21 regions.
 >       Any thoughts?
 >
 > Regards
 > David Alves
 >
 > > -----Original Message-----  > >


From: stack


[mailto:[EMAIL PROTECTED]  > > Sent:


Sunday, March


30, 2008 9:36 PM  > > To:


[EMAIL PROTECTED]


Subject:


Re: StackOverFlow Error in HBase  > >  > >
You're


doing nothing


wrong.
 > >
 > > The filters as written recurse until


they find


a


match.  If


long  > > stretches between matching rows,


then you will


get a  > >


StackOverflowError.  Filters need to


be changed.


Thanks for


pointing


 > > this out.  Can you do without


them for the


moment


until we get


a


chance


 > > to fix it?  (HBASE-554)  > >  >


Thanks,


St.Ack


 > >  > >  > > David Alves wrote:


 > > > Hi St.Ack and all  > > >
 > > >   The error always occurs when


trying to see if


there are more


rows to


 > > > process.
 > > >   Yes I'm using a


filter(RegExpRowFilter) to


select only the rows


(any


 > > > row key) that match a specific


value in one


of


the columns.


 > > >   Then I obtain the scanner just test


the hasNext


method, close


the


 > > > scanner and return.
 > > >   Am I doing something wrong?
 > > >   Still StackOverflowError is not


supposed to


happen right?


 > > >
 > > > Regards
 > > > David Alves  > > > On Thu,


2008-03-27 at


12:36 -0700,


stack wrote:


 > > >
 > > >> You are using a filter?  If


so, tell us


more about it.


 > > >> St.Ack
 > > >>
 > > >> David Alves wrote:
 > > >>
 > > >>> Hi guys  > > >>>
 > > >>>         I 'm using HBase to keep


data that is


later indexed.


 > > >>>         The data is indexed in


chunks so the


cycle is get XXXX


records index


 > > >>> them check for more records etc...
 > > >>>         When I tryed the candidate-2


instead of


the old 0.16.0


(which I


 > > >>> switched to do to the regionservers
becoming


unresponsive)


I


got the


 > > >>> error in the end of this email


well into an


indexing job.


 > > >>>         So you have any idea


why? Am I doing


something wrong?


 > > >>>
 > > >>> David Alves  > > >>>  > > >>>
java.lang.RuntimeException:


org.apache.hadoop.ipc.RemoteException:


 > > >>> java.io.IOException:


java.lang.StackOverflowError


 > > >>>         at


java.io.DataInputStream.readFully(DataInputStream.java:178


)


 > > >>>         at


java.io.DataInputStream.readLong(DataInputStream.java:399)


 > > >>>         at


org.apache.hadoop.dfs.DFSClient


 > > >>>


$BlockReader.readChunk(DFSClient.java:735)


 > > >>>         at
 > > >>>
 > >


org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInpu


tC
hecker.java:


 > > 234)
 > > >>>         at
 > > >>>


org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176


)


 > > >>>         at
 > > >>>


org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:19


3)


 > > >>>         at
 > > >>>


org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:157


)


 > > >>>         at


org.apache.hadoop.dfs.DFSClient


 > > >>> $BlockReader.read(DFSClient.java:658)
 > > >>>         at


org.apache.hadoop.dfs.DFSClient


 > > >>>


$DFSInputStream.readBuffer(DFSClient.java:1130)


 > > >>>         at


org.apache.hadoop.dfs.DFSClient


 > > >>>


$DFSInputStream.read(DFSClient.java:1166)


 > > >>>         at


java.io.DataInputStream.readFully(DataInputStream.java:178


)


 > > >>>         at


org.apache.hadoop.io.DataOutputBuffer


 > > >>>


$Buffer.write(DataOutputBuffer.java:56)


 > > >>>         at
 > > >>>


org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90


)


 > > >>>         at


org.apache.hadoop.io.SequenceFile


 > > >>> $Reader.next(SequenceFile.java:1829)
 > > >>>         at


org.apache.hadoop.io.SequenceFile


 > > >>> $Reader.next(SequenceFile.java:1729)
 > > >>>         at


org.apache.hadoop.io.SequenceFile


 > > >>> $Reader.next(SequenceFile.java:1775)
 > > >>>         at


org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:461)


 > > >>>         at


org.apache.hadoop.hbase.HStore


 > > >>>


$StoreFileScanner.getNext(HStore.java:2350)


 > > >>>         at
 > > >>>
 > >


org.apache.hadoop.hbase.HAbstractScanner.next(HAbstractScanner.java:


25


6)


 > > >>>         at


org.apache.hadoop.hbase.HStore


 > > >>> $HStoreScanner.next(HStore.java:2561)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1807)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at


org.apache.hadoop.hbase.HRegion


 > > >>> $HScanner.next(HRegion.java:1843)  >


...


 > > >>>
 > > >>>
 > > >>>
 > > >>>
 > > >
 > > >
 >




No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database:


269.22.3/1354 -


Release
Date: 4/1/2008 5:38 AM




No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.3/1354 -


Release Date:


4/1/2008
5:38 AM


No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.3/1354 -
Release
Date: 4/1/2008 5:38 AM




No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.5/1357 -


Release Date:


4/3/2008 10:48 AM



No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.5/1357 - Release
Date: 4/3/2008 10:48 AM




No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.5/1357 -


Release Date:


4/3/2008 10:48 AM



No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.5/1359 - Release
Date: 4/4/2008 8:23 AM




No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.5/1359 - Release Date:
4/4/2008 8:23 AM



No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.5/1359 - Release
Date: 4/4/2008 8:23 AM




No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.8/1362 - Release Date:
4/6/2008 11:12 AM



No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.8/1363 - Release
Date: 4/7/2008 8:56 AM




No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.8/1363 - Release Date:

4/7/2008

8:56 AM




Reply via email to