[ 
https://issues.apache.org/jira/browse/HBASE-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614551#action_12614551
 ] 

Izaak Rubin commented on HBASE-679:
-----------------------------------

I've been going through this issue for a little while now, and I think I have 
an idea of what ultimately needs to be done.

First, I was wrong in my last comment when I said that the problem had changed 
from the original issue description.  I've observed the problem in the HBase 
instance that Jim is running from the aa0-00... machines.  And, after a bit 
more fiddling with the UI and some data on my own computer, I think I'm seeing 
trace bits of the same problem as well.

In my opinion, this issue boils down to the exact same problem we saw in 
HBASE-727.  The problem is this: we are caching information about regionserver 
location that most of the time will be wrong.  In HBASE-727 we were able to 
work around this by being a bit hacky - if the regionserver location was wrong 
and the page was unable to load after several tries, we threw an IOException 
into the system that caused some gears to churn, and usually this would stir 
things up enough to get the right location out.  But, being the one who made 
patch for HBASE-727, I have to admit that this is a bit hacky.

Now, in this issue, the cached info coming back from 
HRegionInfo.getValue().getBindAddress() is also incorrect.  As with HBASE-727, 
the data isn't consistently incorrect either - sometimes the address that comes 
back is correct, and sometimes it isn't.  Sometimes after a few page refreshes 
it gets it right, and sometimes it continues to be wrong indefinitely.  

There are a number of things that can be done, both for this particular issue 
and for the larger problem in general.  

For HBASE-679:
 * I can modify table.jsp to fail a bit more gracefully when the wrong address 
comes back.  We could even put up a picture of the twitter fail whale: 
http://www.pestaola.gr/img1/twitter-whale.png (just kidding!)
 * We can leave everything as is and just let the user get an anonymous error 
code 500 in certain cases.  Maybe it's better to tell them nothing if it isn't 
working?
 * I can try to make some kind of a hacky fix.  I'm not sure this will work 
though, since throwing an exception from inside a .jsp page won't really do 
anything.  The fix would have to be in the java code, but then there would be 
an issue of how to determine if the address is right or not.
 * We can punt on the issue until the overall problem is fixed (see below).

For the greater problem of incorrect caching:
 * I would ask why we are caching this information in the first place.  
Information like regionserver host:port is likely to change if the user is 
shutting down HBase and restarting.  There needs to be some way for every node 
to be informed of fresh location data when HBase is started.  Admittedly, I am 
still incredibly naive when it comes to the inner-workings and practicalities 
of HBase, and I would imagine that this is probably a lot easier said than 
done.  I'd imagine that this is something that would get pushed to 0.3.  Still, 
I maintain that this is something worth doing.  This problem continues to 
manifest itself in interesting ways all over the UI, and if we really want the 
UI to be a reliable reflection of the data in HBase, we need to make this fix.

> Regionserver addresses are still not right in the new tables page
> -----------------------------------------------------------------
>
>                 Key: HBASE-679
>                 URL: https://issues.apache.org/jira/browse/HBASE-679
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Izaak Rubin
>         Attachments: ms.patch, ms_revised.patch
>
>
> They are mostly right.
> I'm guessing its stale cache of regions in the client hosted by the UI.  If 
> the webserver ran a scan, it'd probably fix it all up but thats a bit messy.  
> I tried using the address that is in the .META. table directly but that 
> doesn't work.... we don't seem to deploy table properly and UI complains "No 
> server address for row TestTable,,1213074650399".  I'll attach my patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to