Re: StackOverFlow Error in HBase

stack Mon, 07 Apr 2008 20:58:59 -0700

May I see the dump. You can send it to me offline if you prefer. Send'select * from .META.' rather than just the info:regioninfo column.

Its possible to have overlapping regions if both parent and its splitdaughters are present in the table. The parent, the overlapping region,wil be legitimately offline.

Do you have DEBUG enabled on your hbase cluster? If you do, would beinterested in seeing your logs too to see if can figure how the scenariocame about (We cleared up on such case recently that is in 0.1.0).


How many regionservers?

Thanks David,
St.Ack


David Alves wrote:

Hi Again St.Ack

        I ran your command and there was a strange output, in fact you were
right one of the regions is in fact offline but the strange part is that
there are two regions like this:
tableA,,1207585555760
tableA,,1207588561303
        Ant it is the first that is offline (marked as: offline: "true,
split: true,"). Isn't it correct that only the first region of a table
should have a "" first key (it is the first key that's next to the table
name right?).
        Regarding the M/R job, more precisely the error occurs in the
instantiation of the table object (one is instantiated for each M/R job) the
precise stack is:

Caused by: org.apache.hadoop.hbase.TableNotFoundException: Table 'tableA'
was not found.
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
nMeta(HConnectionManager.java:415)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
HConnectionManager.java:346)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
HConnectionManager.java:308)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:89)

Best Regards
David Alves

-----Original Message-----
From: stack [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 08, 2008 12:14 AM
To: [email protected]
Subject: Re: StackOverFlow Error in HBase

The below all sounds good (including the bit where we need to change TIF
so you can specify filters and so its subclassable).

Can you run the commands suggested below to elicit if indeed the online
table has an offlined region in its midst.

St.Ack

David Alves wrote:

Hi St.Ack

        Firstly this is a trunk (HEAD) version updated & patched today
(Monday) running against Hadoop trunk (HEAD) updated Friday.
        The M/R job (a crawler amongst other things) scans a table (lets say
table A) through a custom tableinputformat (because it requires filters)

but

that is only slightly altered version of the TableInputFormat class (btw

would suggest redesigning the class to allow for extension and would

gladly

help). In the end of the map phase and for each record the new

references

found are inserted onto table A (this is precisely where the job fails,
under load) and by the end of the reduce phase the processed records are
inserted onto table B.
        My question here is how should I cope with this kind of failure?

Best Regards
David Alves

-----Original Message-----
From: stack [mailto:[EMAIL PROTECTED]
Sent: Monday, April 07, 2008 7:21 PM
To: [email protected]
Subject: Re: StackOverFlow Error in HBase

I've seen TNFE when a region in the middle of an online table is
offline.  Shouldn't ever happen but....

What I've seen is in the shell, you can do 'show tables;' and it will
list all tables including the one reporting TNFE.

You then attempt a get or a scan against the table and you get the TNFE
exception.

Is this what you are seeing?

Try doing a 'select info:regioninfo from .META.;'  Look for a region
marked offline.  Might be easier if you run the query like this: % echo
'select info:regioninfo from .META.;' | ./bin/hbase --html &>
/tmp/query.out ... because then you can grep around in the
/tmp/query.out file.. or just send it to us off-list and we'll take a
look.

For sure this is 0.1.0?

Thanks,
St.Ack

David Alves wrote:

Hi all

        I think we can consider the test has passed, as previous error logs
told me the M/R job failed around 35,000 records and the job has

reached

42.000, failing for a whole other reason :

Caused by: org.apache.hadoop.hbase.TableNotFoundException: Table

'XXXXX'

was

not found.
        at

org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegio

nI

nMeta(HConnectionManager.java:415)
        at

org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegio

n(

HConnectionManager.java:346)
        at

org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegio

n(

HConnectionManager.java:308)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:89).

Is there a known workaround for this problem I know for sure the table
exists as it has been used in the previous 25 M/R jobs? Should I make

my

code wait&retry until the table is up again?


Regards
David Alves

-----Original Message-----
From: Jim Kellerman [mailto:[EMAIL PROTECTED]
Sent: Monday, April 07, 2008 5:09 PM
To: [email protected]
Subject: RE: StackOverFlow Error in HBase

Yes, trunk is fine since there are no changes in filters between 0.1

and

trunk.

---
Jim Kellerman, Senior Engineer; Powerset

-----Original Message-----
From: David Alves [mailto:[EMAIL PROTECTED]
Sent: Monday, April 07, 2008 8:44 AM
To: [email protected]
Subject: RE: StackOverFlow Error in HBase

Hi Jim
        The job I left running before the weekend had some
(other) problems, mainly about hadoop API change.
        Anyway I'm running it again right now and at first
glance its working (I'll know for sure in about 1 hour), on a
different note there was a problem with RegExpRowFilter where
if it received more that one conditional in the constructor
map it would filter out records it shouldn't, and that
problem is solved.
        As Friday before I got your response I had already
upgraded the cluster to the hadoop and hbase trunk versions
I'm currently testing with these versions instead of 0.1,
hope there is no problem there.
        I'll send another email soon.

Regards
David Alves

On Mon, 2008-04-07 at 08:31 -0700, Jim Kellerman wrote:

David,

Any luck running this patch either against head or against

the 0.1 branch?

Thanks.

---
Jim Kellerman, Senior Engineer; Powerset

-----Original Message-----
From: David Alves [mailto:[EMAIL PROTECTED]
Sent: Friday, April 04, 2008 10:05 AM
To: [email protected]
Subject: RE: StackOverFlow Error in HBase

Hi Jim

        Of course, my questions was regarding whether I

should use

HEAD or some branch or tag.
        Anyways I currently running Hbase HEAD patched against
Hadoop HEAD, I'll know if its ok soon.

Regards
David Alves
On Fri, 2008-04-04 at 09:18 -0700, Jim Kellerman wrote:

After applying the patch, you have to rebuild and

deploy on your

cluster, run your test that was failing and verify that it

now works.

See

http://hadoop.apache.org/hbase/docs/current/api/overview-summary.htm

l#

overview_description



---
Jim Kellerman, Senior Engineer; Powerset

-----Original Message-----
From: David Alves [mailto:[EMAIL PROTECTED]
Sent: Friday, April 04, 2008 6:29 AM
To: [email protected]
Subject: RE: StackOverFlow Error in HBase

Hi all again

        I've never used the patch system you guys use, so

in order

to test the patch submitted by Clint what do I have to do? I
mean I've updated HEAD and applied the patch, is this it?

Regards
David Alves



On Thu, 2008-04-03 at 10:18 -0700, Jim Kellerman wrote:

Thanks David. I'll add 554 as a blocker for 0.1.1

---
Jim Kellerman, Senior Engineer; Powerset

-----Original Message-----
From: David Alves [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 03, 2008 9:21 AM
To: [EMAIL PROTECTED]
Subject: RE: StackOverFlow Error in HBase

Hi Jim and all

        I'll commit to test the patch under the same

conditions as

it failed before, (with around 36000 records) but in this
precise moment I preparing my next development

iteration, which

means a lot

of meetings.
        By the end of the day tomorrow (friday) I

should have a

confirmation whether the patch worked (or not).

Regards
David Alves

On Thu, 2008-04-03 at 09:12 -0700, Jim Kellerman wrote:

David,

Have you had a chance to try this patch? We are about to

release hbase-0.1.1 and until we receive a confirmation in
HBASE-554 from another person who has tried it and

verifies that it

works, we cannot include it in this release. If

it is not in

this release, there will be a significant wait for it

to appear

in an hbase release. hbase-0.1.2 will not happen anytime
soon

unless there

are critical issues that arise that have not been

fixed in 0.1.1.

hbase-0.2.0 is also some time in the future. There are a

significant

number of issues to address before that release is ready.

Frankly, I'd like to see this patch in 0.1.1,

because it

is

an issue for people that use filters.

The alternative would be for Clint to supply a test case

that fails without the patch but passes with the patch.

We will hold up the release, but need a

commitment either

from David to test the patch or for Clint to

supply a test.

We need that commitment by the end of the day today
2008/04/03 along with an eta as to when it will

be completed.

---
Jim Kellerman, Senior Engineer; Powerset

-----Original Message-----
From: David Alves

[mailto:[EMAIL PROTECTED]

Sent: Tuesday, April 01, 2008 2:36 PM
To: [EMAIL PROTECTED]
Subject: RE: StackOverFlow Error in HBase

Hi

        I just deployed the unpatched version.
        Tomorrow I'll rebuild the system with

the patch

and try it

out.
        Thanks again.

Regards
David Alves

-----Original Message-----
From: Jim Kellerman [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 01, 2008 10:04 PM
To: [EMAIL PROTECTED]
Subject: RE: StackOverFlow Error in HBase

David,

Have you tried this patch and does it work for

you? If so

we'll include it
hbase-0.1.1

---
Jim Kellerman, Senior Engineer; Powerset

-----Original Message-----
From: David Alves

[mailto:[EMAIL PROTECTED]

Sent: Tuesday, April 01, 2008 10:44 AM
To: [EMAIL PROTECTED]
Subject: RE: StackOverFlow Error in HBase

Hi
        Thanks for the prompt path Clint,

St.Ack and

all you guys.

Regards
David Alves

-----Original Message-----
From: [EMAIL PROTECTED]

[mailto:[EMAIL PROTECTED]

On Behalf

Of Clint Morgan
Sent: Tuesday, April 01, 2008 2:04 AM
To: [EMAIL PROTECTED]
Subject: Re: StackOverFlow Error in HBase

Try the patch at

https://issues.apache.org/jira/browse/HBASE-554.

cheers,
-clint

On Mon, Mar 31, 2008 at 5:39 AM, David Alves
<[EMAIL PROTECTED]> wrote:

Hi ... again

        In my previous mail I stated that

increasing the

stack size

solved the

 problem, well I jumped a little bit to the

conclusion,

in fact it

didn't, the StackOverFlowError always occurs

at the end

of the cycle

when no more records match the filter. Anyway
I've

rewritten my

application to use a normal scanner

and and do

the

"filtering" after

which is not optimal but it works.
        I'm just saying this because it might

be a clue,

in previous

versions

 (!= 0.1.0) even though a more serious

problem happened

(regionservers  became irresponsive after so
many

records) this

didn't happen. Btw in  current version I

notice no,

or

very small,

decrease of thoughput with  time, great work!

 Regards
 David Alves







 On Mon, 2008-03-31 at 05:18 +0100, David

Alves wrote:

 > Hi again
 >
 >       As I was almost at the end (80%)

of indexable

docs, for the

time

 > being I simply increased the stack size,
which

seemed to work.

 >       Thanks for your input St.Ack

really helped me

solve the problem

at

 > least for the moment.
 >       On another note in the same

method I changed

the way the

scanner was

 > obtained when htable.getStartKeys()

would be more

than

1, so that

could

 > limit the records read each time

to a single

region, and the

scanning

would

 > start at the last region, strangely

the number of

keys

obtained

by  > htable.getStartKeys() was always 1

even though

by the end

there are

already

 > 21 regions.
 >       Any thoughts?
 >
 > Regards
 > David Alves
 >
 > > -----Original Message-----  > >

From: stack

[mailto:[EMAIL PROTECTED]  > > Sent:

Sunday, March

30, 2008 9:36 PM  > > To:

[EMAIL PROTECTED]

Subject:

Re: StackOverFlow Error in HBase  > >  > >
You're

doing nothing

wrong.
 > >
 > > The filters as written recurse until

they find

match.  If

long  > > stretches between matching rows,

then you will

get a  > >

StackOverflowError.  Filters need to

be changed.

Thanks for

pointing

 > > this out.  Can you do without

them for the

moment

until we get

chance

 > > to fix it?  (HBASE-554)  > >  >

Thanks,

St.Ack

 > >  > >  > > David Alves wrote:

 > > > Hi St.Ack and all  > > >
 > > >   The error always occurs when

trying to see if

there are more

rows to

 > > > process.
 > > >   Yes I'm using a

filter(RegExpRowFilter) to

select only the rows

(any

 > > > row key) that match a specific

value in one

of

the columns.

 > > >   Then I obtain the scanner just test

the hasNext

method, close

the

 > > > scanner and return.
 > > >   Am I doing something wrong?
 > > >   Still StackOverflowError is not

supposed to

happen right?

 > > >
 > > > Regards
 > > > David Alves  > > > On Thu,

2008-03-27 at

12:36 -0700,

stack wrote:

 > > >
 > > >> You are using a filter?  If

so, tell us

more about it.

 > > >> St.Ack
 > > >>
 > > >> David Alves wrote:
 > > >>
 > > >>> Hi guys  > > >>>
 > > >>>         I 'm using HBase to keep

data that is

later indexed.

 > > >>>         The data is indexed in

chunks so the

cycle is get XXXX

records index

 > > >>> them check for more records etc...
 > > >>>         When I tryed the candidate-2

instead of

the old 0.16.0

(which I

 > > >>> switched to do to the regionservers
becoming

unresponsive)

got the

 > > >>> error in the end of this email

well into an

indexing job.

 > > >>>         So you have any idea

why? Am I doing

something wrong?

 > > >>>
 > > >>> David Alves  > > >>>  > > >>>
java.lang.RuntimeException:

org.apache.hadoop.ipc.RemoteException:

 > > >>> java.io.IOException:

java.lang.StackOverflowError

 > > >>>         at

java.io.DataInputStream.readFully(DataInputStream.java:178

 > > >>>         at

java.io.DataInputStream.readLong(DataInputStream.java:399)

 > > >>>         at

org.apache.hadoop.dfs.DFSClient

 > > >>>

$BlockReader.readChunk(DFSClient.java:735)

 > > >>>         at
 > > >>>
 > >

org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInpu

tC
hecker.java:

 > > 234)
 > > >>>         at
 > > >>>

org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176

 > > >>>         at
 > > >>>

org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:19

3)

 > > >>>         at
 > > >>>

org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:157

 > > >>>         at

org.apache.hadoop.dfs.DFSClient

 > > >>> $BlockReader.read(DFSClient.java:658)
 > > >>>         at

org.apache.hadoop.dfs.DFSClient

 > > >>>

$DFSInputStream.readBuffer(DFSClient.java:1130)

 > > >>>         at

org.apache.hadoop.dfs.DFSClient

 > > >>>

$DFSInputStream.read(DFSClient.java:1166)

 > > >>>         at

java.io.DataInputStream.readFully(DataInputStream.java:178

 > > >>>         at

org.apache.hadoop.io.DataOutputBuffer

 > > >>>

$Buffer.write(DataOutputBuffer.java:56)

 > > >>>         at
 > > >>>

org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90

 > > >>>         at

org.apache.hadoop.io.SequenceFile

 > > >>> $Reader.next(SequenceFile.java:1829)
 > > >>>         at

org.apache.hadoop.io.SequenceFile

 > > >>> $Reader.next(SequenceFile.java:1729)
 > > >>>         at

org.apache.hadoop.io.SequenceFile

 > > >>> $Reader.next(SequenceFile.java:1775)
 > > >>>         at

org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:461)

 > > >>>         at

org.apache.hadoop.hbase.HStore

 > > >>>

$StoreFileScanner.getNext(HStore.java:2350)

 > > >>>         at
 > > >>>
 > >

org.apache.hadoop.hbase.HAbstractScanner.next(HAbstractScanner.java:

6)

 > > >>>         at

org.apache.hadoop.hbase.HStore

 > > >>> $HStoreScanner.next(HStore.java:2561)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1807)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)
 > > >>>         at

org.apache.hadoop.hbase.HRegion

 > > >>> $HScanner.next(HRegion.java:1843)  >

...

 > > >>>
 > > >>>
 > > >>>
 > > >>>
 > > >
 > > >
 >

No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database:

269.22.3/1354 -

Release
Date: 4/1/2008 5:38 AM

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.3/1354 -

Release Date:

4/1/2008
5:38 AM

No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.3/1354 -
Release
Date: 4/1/2008 5:38 AM

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.5/1357 -

Release Date:

4/3/2008 10:48 AM

No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.5/1357 - Release
Date: 4/3/2008 10:48 AM

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.5/1357 -

Release Date:

4/3/2008 10:48 AM

No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.5/1359 - Release
Date: 4/4/2008 8:23 AM

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.5/1359 - Release Date:
4/4/2008 8:23 AM

No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.5/1359 - Release
Date: 4/4/2008 8:23 AM

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.8/1362 - Release Date:
4/6/2008 11:12 AM

No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.8/1363 - Release
Date: 4/7/2008 8:56 AM

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.8/1363 - Release Date:

4/7/2008

8:56 AM

Re: StackOverFlow Error in HBase

Reply via email to