Hi Daan,
Thanks for the clarification. I will try to do what you said. But it might be 
hard for me to trigger some of them because they’re in exception handling code. 
Still I will try.

In the meantime, please let me know if there is anything I can further help 
from my side.

Ding


On Apr 5, 2014, at 5:02 AM, Daan Hoogland <daan.hoogl...@gmail.com> wrote:

> H Ding,
> 
> I didn't mean for you to test it in a certain way. I was just curious
> about what you did with the patch before you submitted it. Did you
> start a cloud instance with it and try to hit any of the log messages
> for instance. You explained the background of your effort and I am
> curious as to how it satisfied your objectives.
> 
> I will let the patch rest a few days to see if we get any more
> reactions and apply it from Denver next week.
> 
> On Sat, Apr 5, 2014 at 2:03 AM, Ding Yuan <y...@ece.utoronto.ca> wrote:
>> HI Daan,
>> I am not sure exactly how to monkey-test cloudstack, what I did was to do
>> $ mvn test
>> which shows "BUILD SUCCESS". I also did this:
>> $ mvn clean install -P systemvm,developer
>> which also succeeded.
>> 
>> Is that what you mean? If not, please let me know what to do and I will 
>> further test it.
>> I will also work on assigning the proper reviewers now.
>> 
>> Thanks,
>> Ding
>> 
>> On Apr 3, 2014, at 12:04 PM, Daan Hoogland <daan.hoogl...@gmail.com> wrote:
>> 
>>> thanks Ding,
>>> 
>>> I saw your update. Did your run a cloud with this code; i.e. did you
>>> monkey-test it?
>>> 
>>> On Thu, Apr 3, 2014 at 5:26 PM, Ding Yuan <y...@ece.utoronto.ca> wrote:
>>>> Oops, sorry I didn't publish my diff. I just published it on review board.
>>>> Thanks for the comment Daan! Please let me know if I further need to 
>>>> improve it.
>>>> 
>>>> Ding
>>>> 
>>>> On Apr 3, 2014, at 9:52 AM, Daan Hoogland <daan.hoogl...@gmail.com> wrote:
>>>> 
>>>>> Ding,
>>>>> 
>>>>> I think we can dare to do so in master as it will not see release for
>>>>> a while. We'll just have to be aware of the locations and be on alert
>>>>> for any stacktraces that will pass this list. I would not like to do
>>>>> this on the 4.4 branch even when it is an improvement of code quality
>>>>> as such. It might do things or prevent things from happening that we
>>>>> need done.
>>>>> 
>>>>> I don't see a new version of the diff in the review request. Did you
>>>>> 'Update' -> 'Upload Diff'?
>>>>> 
>>>>> regards,
>>>>> Daan
>>>>> 
>>>>> On Thu, Apr 3, 2014 at 12:34 AM, Ding Yuan <y...@ece.utoronto.ca> wrote:
>>>>>> Uploaded a new patch to 19917. Changed the verbosity to debug, and 
>>>>>> addressed
>>>>>> Daan's comment on providing more distinctive text messages.
>>>>>> 
>>>>>> Sorry that I haven't split them into smaller patches.
>>>>>> 
>>>>>> Note in a few cases the original code was like:
>>>>>>       try {
>>>>>>          pstmt = txn.prepareAutoCloseStatement(sql);
>>>>>>          String gmtCutTime =
>>>>>> DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime);
>>>>>>          pstmt.setString(1, gmtCutTime);
>>>>>>          pstmt.setString(2, gmtCutTime);
>>>>>> 
>>>>>>          ResultSet rs = pstmt.executeQuery();
>>>>>>          while (rs.next()) {
>>>>>>              RunningHostCountInfo info = new RunningHostCountInfo();
>>>>>>              info.setDcId(rs.getLong(1));
>>>>>>              info.setHostType(rs.getString(2));
>>>>>>              info.setCount(rs.getInt(3));
>>>>>> 
>>>>>>              l.add(info);
>>>>>>                     }
>>>>>>               } catch (SQLException e) {
>>>>>>               } catch (Throwable e) {
>>>>>>               }
>>>>>> 
>>>>>> The try block only throws SQLException as checked exception, and this 
>>>>>> code
>>>>>> would also swallow any unchecked exceptions. I removed the catch 
>>>>>> (Throwable)
>>>>>> in these cases to avoid potentially swallowing any unexpected runtime
>>>>>> exceptions. Please let me know if this is not desirable so I can further
>>>>>> update.
>>>>>> 
>>>>>> Thanks,
>>>>>> Ding
>>>>>> 
>>>>>> On Apr 2, 2014, at 5:17 PM, Ding Yuan <y...@eecg.toronto.edu> wrote:
>>>>>> 
>>>>>> Thanks all for the quick comments!
>>>>>> If i understand the discussion correctly, I will just change all the 
>>>>>> added
>>>>>> log printing statements to debug verbosity. I will upload a new patch for
>>>>>> that shortly.
>>>>>> 
>>>>>> Now a bit back story: the reason we are doing this is that we recently 
>>>>>> did
>>>>>> an analysis on many bugs collected from JIRA to understand why today's 
>>>>>> cloud
>>>>>> system fails. And we found that almost all of the cluster-wide failures 
>>>>>> are
>>>>>> caused by buggy exception handling, which would often turn a component
>>>>>> failure into a cluster-wide one. One of the common bug pattern is 
>>>>>> ignoring
>>>>>> some exceptions -- allowing them to propagate and finally turn into 
>>>>>> disaster.
>>>>>> Therefore we built a simple static checking tool just to check some 
>>>>>> simple
>>>>>> rules for exception handling, such as if an exception is ignored.
>>>>>> Admittedly, it would be much harder to reason about the potential
>>>>>> consequences caused for ignoring a certain exception, that's why without
>>>>>> much more domain knowledge I can only recommend to 1) avoid 
>>>>>> over-catching an
>>>>>> exception, especially when the handling logic will swallow it, and 2) log
>>>>>> them, as what this patch does.
>>>>>> 
>>>>>> Nevertheless, it seems the four cases I mentioned in JIRA-6242 are
>>>>>> particularly suspicious. It might be worthwhile to double check their
>>>>>> correctness if you have time. I am reposting them below.
>>>>>> 
>>>>>> Thanks!
>>>>>> Ding
>>>>>> 
>>>>>> =========================
>>>>>> Case 1:
>>>>>> Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java"
>>>>>> 
>>>>>> 326: try {
>>>>>> 327: String myIp = getGreEndpointIP(dest.getHost(), nw);
>>>>>> .. .. ..
>>>>>> 373: } catch (Exception e) {
>>>>>> 374: // I really thing we should do a better handling of these exceptions
>>>>>> 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e);
>>>>>> 376: }
>>>>>> 
>>>>>> Comment seems to suggest for a better handling.
>>>>>> =========================
>>>>>> =========================
>>>>>> Case 2:
>>>>>> Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java"
>>>>>> 
>>>>>> 2284: try {
>>>>>> 2285: /*
>>>>>> 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't
>>>>>> 2287: * return if propagation return non null
>>>>>> 2288: */
>>>>>> 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(),
>>>>>> ResourceState.Event.UpdatePassword);
>>>>>> 2290: if (result != null) {
>>>>>> 2291: return result;
>>>>>> 2292: }
>>>>>> 2293:
>>>>>> 2294: doUpdateHostPassword(h.getId());
>>>>>> 2295: } catch (AgentUnavailableException e) {
>>>>>> 2296: }
>>>>>> 
>>>>>> Seem from the comment the logic should be fixed.
>>>>>> A similar code snipeet is at:
>>>>>> Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java"
>>>>>> =========================
>>>>>> 
>>>>>> =========================
>>>>>> Case 3:
>>>>>> 
>>>>>> Line: 184, File:
>>>>>> "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java"
>>>>>> 
>>>>>> 174: try
>>>>>> 175: {
>>>>>> 176: success = _autoScaleService.configureAutoScaleVmGroup(this);
>>>>>> 177: if (success) {
>>>>>> 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, 
>>>>>> getEntityId());
>>>>>> 179: AutoScaleVmGroupResponse responseObject =
>>>>>> _responseGenerator.createAutoScaleVmGroupResponse(vmGroup);
>>>>>> 180: setResponseObject(responseObject);
>>>>>> 181: responseObject.setResponseName(getCommandName());
>>>>>> 182: }
>>>>>> 183: } catch (Exception ex) {
>>>>>> 184: // TODO what will happen if Resource Layer fails in a step inbetween
>>>>>> 185: s_logger.warn("Failed to create autoscale vm group", ex);
>>>>>> 186: }
>>>>>> 
>>>>>> The comment, "TODO", seems to suggest for better handling.
>>>>>> =========================
>>>>>> =========================
>>>>>> Case 4:
>>>>>> 
>>>>>> Line: 222, File: "com/cloud/api/ApiDispatcher.java"
>>>>>> This snippet is moved to ParamProcessWorker.java in the trunk.
>>>>>> 
>>>>>> 203: try {
>>>>>> 204: setFieldValue(field, cmd, paramObj, parameterAnnotation);
>>>>>>   .. ..
>>>>>> 220: } catch (CloudRuntimeException cloudEx) {
>>>>>> 221: s_logger.error("CloudRuntimeException", cloudEx);
>>>>>> 222: // FIXME: Better error message? This only happens if the API 
>>>>>> command is
>>>>>> not executable, which typically
>>>>>> 223: //means
>>>>>> 224: // there was
>>>>>> 225: // and IllegalAccessException setting one of the parameters.
>>>>>> 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal
>>>>>> error executing API command " + cmd.getCommandName().substring(0,
>>>>>> cmd.getCommandName().length() - 8));
>>>>>> 227: }
>>>>>> 
>>>>>> The "FIXME" comment seems to suggest for getter error message.
>>>>>> =========================
>>>>>> On Apr 2, 2014, at 4:37 PM, Daan Hoogland <daan.hoogl...@gmail.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> Ding Yuan,
>>>>>> 
>>>>>> Any objections to that?
>>>>>> 
>>>>>> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk
>>>>>> <alena.prokharc...@citrix.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 4/2/14, 1:27 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote:
>>>>>> 
>>>>>> I think we agree indeed. Doesn't mean we should start this discuss
>>>>>> thread or write a arch guideline on the wiki somewhere. Maybe Ding
>>>>>> Yuan wants to do a preliminary version of it?
>>>>>> 
>>>>>> 
>>>>>> Wiki guide would be useful indeed.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> In the meantime I don't think that it hurts for the present patch to
>>>>>> do everything in debug and decide about higher levels needed later.
>>>>>> 
>>>>>> 
>>>>>> Agree.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> regards,
>>>>>> 
>>>>>> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk
>>>>>> <alena.prokharc...@citrix.com> wrote:
>>>>>> 
>>>>>> Daan,
>>>>>> 
>>>>>> Correct me if I¹m wrong, but all of the logging added by Ding, fall
>>>>>> under
>>>>>> "to go with it or to indicate passing a certain code path². I¹ve just
>>>>>> noticed that some of them were added with DEBUG, and some with WARN
>>>>>> level,
>>>>>> and wanted to correct that.
>>>>>> 
>>>>>> So we should:
>>>>>> 
>>>>>> 1) For sure: never print them out in WARN as there is nothing admin
>>>>>> should
>>>>>> do in this case, because the code just handles them by ignoring.
>>>>>> 2) Figure out what would be the correct level to log them with: INFO or
>>>>>> DEBUG
>>>>>> 
>>>>>> From ³Logging best practices² articles, I can see that people use INFO
>>>>>> as
>>>>>> a ³storyline² of normal application behavior, and DEBUG for sort of
>>>>>> information that helps you to track down the failure cases scenarios.
>>>>>> To
>>>>>> me, stuff added by Ding, falls under second category. But I might be
>>>>>> wrong
>>>>>> as I don¹t recall on the spot any discussions happening on the debug
>>>>>> topic, from the mailing list.
>>>>>> 
>>>>>> -Alena.
>>>>>> 
>>>>>> 
>>>>>> On 4/2/14, 12:57 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote:
>>>>>> 
>>>>>> Alena,
>>>>>> 
>>>>>> What I read in your comment is a description of INFO vs WARN. Debug
>>>>>> would be only for outputting stacktraces to go with it or to indicate
>>>>>> passing a certain code path.
>>>>>> 
>>>>>> Agree?
>>>>>> 
>>>>>> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk
>>>>>> <alena.prokharc...@citrix.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> -----------------------------------------------------------
>>>>>> This is an automatically generated e-mail. To reply, visit:
>>>>>> https://reviews.apache.org/r/19917/#review39324
>>>>>> -----------------------------------------------------------
>>>>>> 
>>>>>> 
>>>>>> Is there a reason why logs for some exceptions are being logged in
>>>>>> DEBUG mode, and some in WARN? From my point of view, if the code only
>>>>>> catches it and doesn't error out, it should be logged in DEBUG. Lots of
>>>>>> Admins are seeking for WARN statements in the log, and they might be
>>>>>> confused seeing WARN w/o further failure or retry.
>>>>>> 
>>>>>> - Alena Prokharchyk
>>>>>> 
>>>>>> 
>>>>>> On April 2, 2014, 1:55 p.m., Ding Yuan wrote:
>>>>>> 
>>>>>> 
>>>>>> -----------------------------------------------------------
>>>>>> This is an automatically generated e-mail. To reply, visit:
>>>>>> https://reviews.apache.org/r/19917/
>>>>>> -----------------------------------------------------------
>>>>>> 
>>>>>> (Updated April 2, 2014, 1:55 p.m.)
>>>>>> 
>>>>>> 
>>>>>> Review request for cloudstack.
>>>>>> 
>>>>>> 
>>>>>> Repository: cloudstack-git
>>>>>> 
>>>>>> 
>>>>>> Description
>>>>>> -------
>>>>>> 
>>>>>> This is the patch for JIRA-6242. See
>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more
>>>>>> details.
>>>>>> Thanks!
>>>>>> 
>>>>>> 
>>>>>> Diffs
>>>>>> -----
>>>>>> 
>>>>>> 
>>>>>> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java
>>>>>> 0d41bc1
>>>>>> 
>>>>>> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager
>>>>>> Im
>>>>>> pl.java 01508a4
>>>>>> 
>>>>>> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>>>> 3e088db
>>>>>> 
>>>>>> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit
>>>>>> y/
>>>>>> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e
>>>>>> engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f
>>>>>> engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d
>>>>>> engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java
>>>>>> e42eaf4
>>>>>> engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java
>>>>>> 34fdca5
>>>>>> engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java
>>>>>> 58dd916
>>>>>> engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0
>>>>>> engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java
>>>>>> 1f382d6
>>>>>> 
>>>>>> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM
>>>>>> an
>>>>>> agerImpl.java 6ed1274
>>>>>> 
>>>>>> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla
>>>>>> ss
>>>>>> Registry.java 83c8a42
>>>>>> 
>>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer
>>>>>> ve
>>>>>> rDiscoverer.java 0ad6dc4
>>>>>> 
>>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>>>>>> rC
>>>>>> onnectionPool.java b779085
>>>>>> 
>>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>>>>>> rS
>>>>>> torageProcessor.java e512046
>>>>>> 
>>>>>> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat
>>>>>> as
>>>>>> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a
>>>>>> server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba
>>>>>> server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8
>>>>>> 
>>>>>> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT
>>>>>> hu
>>>>>> mbnailHandler.java 06f21d3
>>>>>> utils/src/com/cloud/utils/net/NetUtils.java 6350986
>>>>>> 
>>>>>> Diff: https://reviews.apache.org/r/19917/diff/
>>>>>> 
>>>>>> 
>>>>>> Testing
>>>>>> -------
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Ding Yuan
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Daan
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Daan
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Daan
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Daan
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Daan
>>> 
>> 
> 
> 
> 
> -- 
> Daan
> 

Reply via email to