[
https://issues.apache.org/jira/browse/HBASE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076045#comment-13076045
]
[email protected] commented on HBASE-4014:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/969/#review1261
-----------------------------------------------------------
Sorry for the belated review.
The additional explicit error handling around coprocessor invocations here is
good, but I think this is still missing part of the intent of the original bug
description. I'm concerned about aborts that happen not just resulting
directly from unhandled exceptions in coprocessor code, but from bad
coprocessor behavior that eventually triggers an abort from within core _HBase_
code. This could be a memory leak in the coprocessor that eventually triggers
an OOME that only shows up in reading in the next RPC request, or something
that corrupts internal state or data structures. As a result, I think it's
equally important that the loaded coprocessor set be logged _within_
HMaster.abort() and HRegionServer.abort(), in all cases. So I don't think
logging from the CoprocessorHost implementations is sufficient.
For HRegionServer at least, since RegionCoprocessorHost is associated at the
HRegion level, this would need a distinct set of all coprocessor classes that
have been loaded in that region server's lifetime, accessible at the top
(server) level. (I thought we discussed this at some point, but can't find
those comments -- maybe I'm hearing voices?) I think a simple singleton would
work, with something like an internal HashSet and an add(String
coprocessorClass) method that's called on coprocessor load. The same would
work on HMaster, though it's not strictly necessary since only a single
MasterCoprocessorHost instance is created at the HMaster level. But
consistency in implementation here is probably good. The number of
CoprocessorHost types we have may increase if new Observer types are added.
I think what's here is good, with only some minor nits on naming. But I think
the above enhancement is pretty critical to add.
src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
<https://reviews.apache.org/r/969/#comment2892>
I saw in previous comments that this is needed because SortedCopyOnWriteSet
doesn't implement toString(). So why not make SortedCopyOnWriteSet implement
toString()? Seems cleaner to me and more generic/reusable.
Wherever this is implemented, use a StringBuilder to create the string to
return, not repeated string concatenation.
src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
<https://reviews.apache.org/r/969/#comment2894>
Awfully long name, maybe just abortServer()?
src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java
<https://reviews.apache.org/r/969/#comment2895>
This and the following method names are awfully long. It's just personal
preference, but I like to keep things shorter.
src/test/java/org/apache/hadoop/hbase/coprocessor/BuggyRegionObserver.java
<https://reviews.apache.org/r/969/#comment2893>
Minor nit, but why have this as a separate top-level class? Seems like it
could just be an inner class in TestRegionServerCoprocessorException same way
that the BuggyMasterObserver is an inner class in
TestMasterCoprocessorException.
- Gary
On 2011-08-01 22:08:27, Eugene Koontz wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/969/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-08-01 22:08:27)
bq.
bq.
bq. Review request for hbase, Gary Helmling and Mingjie Lai.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. https://issues.apache.org/jira/browse/HBASE-4014 Coprocessors: Flag the
presence of coprocessors in logged exceptions
bq.
bq. The general gist here is to wrap each of
{Master,RegionServer}CoprocessorHost's coprocessor call inside a
bq.
bq. "try { ... } catch (Throwable e) { handleCoprocessorThrowable(e) }"
bq.
bq. block.
bq.
bq. handleCoprocessorThrowable() is responsible for either passing 'e' along
to the client (if 'e' is an IOException) or, otherwise, aborting the service
(Regionserver or Master).
bq.
bq. The abort message contains a list of the loaded coprocessors for crash
analysis.
bq.
bq.
bq. This addresses bug HBASE-4014.
bq. https://issues.apache.org/jira/browse/HBASE-4014
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
18ba6e7
bq. src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java
aa930f5
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java
c2b3558
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/BuggyRegionObserver.java
PRE-CREATION
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorException.java
PRE-CREATION
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorException.java
PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/969/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. patch includes two tests:
bq.
bq. TestMasterCoprocessorException.java
bq. TestRegionServerCoprocessorException.java
bq.
bq. both tests pass in my build environment.
bq.
bq.
bq. Thanks,
bq.
bq. Eugene
bq.
bq.
> Coprocessors: Flag the presence of coprocessors in logged exceptions
> --------------------------------------------------------------------
>
> Key: HBASE-4014
> URL: https://issues.apache.org/jira/browse/HBASE-4014
> Project: HBase
> Issue Type: Improvement
> Components: coprocessors
> Reporter: Andrew Purtell
> Assignee: Eugene Koontz
> Fix For: 0.92.0
>
> Attachments: HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch,
> HBASE-4014.patch
>
>
> For some initial triage of bug reports for core versus for deployments with
> loaded coprocessors, we need something like the Linux kernel's taint flag,
> and list of linked in modules that show up in the output of every OOPS, to
> appear above or below exceptions that appear in the logs.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira