Hi Daan, Thanks for the clarification. I will try to do what you said. But it might be hard for me to trigger some of them because they’re in exception handling code. Still I will try.
In the meantime, please let me know if there is anything I can further help from my side. Ding On Apr 5, 2014, at 5:02 AM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: > H Ding, > > I didn't mean for you to test it in a certain way. I was just curious > about what you did with the patch before you submitted it. Did you > start a cloud instance with it and try to hit any of the log messages > for instance. You explained the background of your effort and I am > curious as to how it satisfied your objectives. > > I will let the patch rest a few days to see if we get any more > reactions and apply it from Denver next week. > > On Sat, Apr 5, 2014 at 2:03 AM, Ding Yuan <y...@ece.utoronto.ca> wrote: >> HI Daan, >> I am not sure exactly how to monkey-test cloudstack, what I did was to do >> $ mvn test >> which shows "BUILD SUCCESS". I also did this: >> $ mvn clean install -P systemvm,developer >> which also succeeded. >> >> Is that what you mean? If not, please let me know what to do and I will >> further test it. >> I will also work on assigning the proper reviewers now. >> >> Thanks, >> Ding >> >> On Apr 3, 2014, at 12:04 PM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: >> >>> thanks Ding, >>> >>> I saw your update. Did your run a cloud with this code; i.e. did you >>> monkey-test it? >>> >>> On Thu, Apr 3, 2014 at 5:26 PM, Ding Yuan <y...@ece.utoronto.ca> wrote: >>>> Oops, sorry I didn't publish my diff. I just published it on review board. >>>> Thanks for the comment Daan! Please let me know if I further need to >>>> improve it. >>>> >>>> Ding >>>> >>>> On Apr 3, 2014, at 9:52 AM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: >>>> >>>>> Ding, >>>>> >>>>> I think we can dare to do so in master as it will not see release for >>>>> a while. We'll just have to be aware of the locations and be on alert >>>>> for any stacktraces that will pass this list. I would not like to do >>>>> this on the 4.4 branch even when it is an improvement of code quality >>>>> as such. It might do things or prevent things from happening that we >>>>> need done. >>>>> >>>>> I don't see a new version of the diff in the review request. Did you >>>>> 'Update' -> 'Upload Diff'? >>>>> >>>>> regards, >>>>> Daan >>>>> >>>>> On Thu, Apr 3, 2014 at 12:34 AM, Ding Yuan <y...@ece.utoronto.ca> wrote: >>>>>> Uploaded a new patch to 19917. Changed the verbosity to debug, and >>>>>> addressed >>>>>> Daan's comment on providing more distinctive text messages. >>>>>> >>>>>> Sorry that I haven't split them into smaller patches. >>>>>> >>>>>> Note in a few cases the original code was like: >>>>>> try { >>>>>> pstmt = txn.prepareAutoCloseStatement(sql); >>>>>> String gmtCutTime = >>>>>> DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime); >>>>>> pstmt.setString(1, gmtCutTime); >>>>>> pstmt.setString(2, gmtCutTime); >>>>>> >>>>>> ResultSet rs = pstmt.executeQuery(); >>>>>> while (rs.next()) { >>>>>> RunningHostCountInfo info = new RunningHostCountInfo(); >>>>>> info.setDcId(rs.getLong(1)); >>>>>> info.setHostType(rs.getString(2)); >>>>>> info.setCount(rs.getInt(3)); >>>>>> >>>>>> l.add(info); >>>>>> } >>>>>> } catch (SQLException e) { >>>>>> } catch (Throwable e) { >>>>>> } >>>>>> >>>>>> The try block only throws SQLException as checked exception, and this >>>>>> code >>>>>> would also swallow any unchecked exceptions. I removed the catch >>>>>> (Throwable) >>>>>> in these cases to avoid potentially swallowing any unexpected runtime >>>>>> exceptions. Please let me know if this is not desirable so I can further >>>>>> update. >>>>>> >>>>>> Thanks, >>>>>> Ding >>>>>> >>>>>> On Apr 2, 2014, at 5:17 PM, Ding Yuan <y...@eecg.toronto.edu> wrote: >>>>>> >>>>>> Thanks all for the quick comments! >>>>>> If i understand the discussion correctly, I will just change all the >>>>>> added >>>>>> log printing statements to debug verbosity. I will upload a new patch for >>>>>> that shortly. >>>>>> >>>>>> Now a bit back story: the reason we are doing this is that we recently >>>>>> did >>>>>> an analysis on many bugs collected from JIRA to understand why today's >>>>>> cloud >>>>>> system fails. And we found that almost all of the cluster-wide failures >>>>>> are >>>>>> caused by buggy exception handling, which would often turn a component >>>>>> failure into a cluster-wide one. One of the common bug pattern is >>>>>> ignoring >>>>>> some exceptions -- allowing them to propagate and finally turn into >>>>>> disaster. >>>>>> Therefore we built a simple static checking tool just to check some >>>>>> simple >>>>>> rules for exception handling, such as if an exception is ignored. >>>>>> Admittedly, it would be much harder to reason about the potential >>>>>> consequences caused for ignoring a certain exception, that's why without >>>>>> much more domain knowledge I can only recommend to 1) avoid >>>>>> over-catching an >>>>>> exception, especially when the handling logic will swallow it, and 2) log >>>>>> them, as what this patch does. >>>>>> >>>>>> Nevertheless, it seems the four cases I mentioned in JIRA-6242 are >>>>>> particularly suspicious. It might be worthwhile to double check their >>>>>> correctness if you have time. I am reposting them below. >>>>>> >>>>>> Thanks! >>>>>> Ding >>>>>> >>>>>> ========================= >>>>>> Case 1: >>>>>> Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java" >>>>>> >>>>>> 326: try { >>>>>> 327: String myIp = getGreEndpointIP(dest.getHost(), nw); >>>>>> .. .. .. >>>>>> 373: } catch (Exception e) { >>>>>> 374: // I really thing we should do a better handling of these exceptions >>>>>> 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e); >>>>>> 376: } >>>>>> >>>>>> Comment seems to suggest for a better handling. >>>>>> ========================= >>>>>> ========================= >>>>>> Case 2: >>>>>> Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java" >>>>>> >>>>>> 2284: try { >>>>>> 2285: /* >>>>>> 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't >>>>>> 2287: * return if propagation return non null >>>>>> 2288: */ >>>>>> 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(), >>>>>> ResourceState.Event.UpdatePassword); >>>>>> 2290: if (result != null) { >>>>>> 2291: return result; >>>>>> 2292: } >>>>>> 2293: >>>>>> 2294: doUpdateHostPassword(h.getId()); >>>>>> 2295: } catch (AgentUnavailableException e) { >>>>>> 2296: } >>>>>> >>>>>> Seem from the comment the logic should be fixed. >>>>>> A similar code snipeet is at: >>>>>> Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java" >>>>>> ========================= >>>>>> >>>>>> ========================= >>>>>> Case 3: >>>>>> >>>>>> Line: 184, File: >>>>>> "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java" >>>>>> >>>>>> 174: try >>>>>> 175: { >>>>>> 176: success = _autoScaleService.configureAutoScaleVmGroup(this); >>>>>> 177: if (success) { >>>>>> 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, >>>>>> getEntityId()); >>>>>> 179: AutoScaleVmGroupResponse responseObject = >>>>>> _responseGenerator.createAutoScaleVmGroupResponse(vmGroup); >>>>>> 180: setResponseObject(responseObject); >>>>>> 181: responseObject.setResponseName(getCommandName()); >>>>>> 182: } >>>>>> 183: } catch (Exception ex) { >>>>>> 184: // TODO what will happen if Resource Layer fails in a step inbetween >>>>>> 185: s_logger.warn("Failed to create autoscale vm group", ex); >>>>>> 186: } >>>>>> >>>>>> The comment, "TODO", seems to suggest for better handling. >>>>>> ========================= >>>>>> ========================= >>>>>> Case 4: >>>>>> >>>>>> Line: 222, File: "com/cloud/api/ApiDispatcher.java" >>>>>> This snippet is moved to ParamProcessWorker.java in the trunk. >>>>>> >>>>>> 203: try { >>>>>> 204: setFieldValue(field, cmd, paramObj, parameterAnnotation); >>>>>> .. .. >>>>>> 220: } catch (CloudRuntimeException cloudEx) { >>>>>> 221: s_logger.error("CloudRuntimeException", cloudEx); >>>>>> 222: // FIXME: Better error message? This only happens if the API >>>>>> command is >>>>>> not executable, which typically >>>>>> 223: //means >>>>>> 224: // there was >>>>>> 225: // and IllegalAccessException setting one of the parameters. >>>>>> 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal >>>>>> error executing API command " + cmd.getCommandName().substring(0, >>>>>> cmd.getCommandName().length() - 8)); >>>>>> 227: } >>>>>> >>>>>> The "FIXME" comment seems to suggest for getter error message. >>>>>> ========================= >>>>>> On Apr 2, 2014, at 4:37 PM, Daan Hoogland <daan.hoogl...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Ding Yuan, >>>>>> >>>>>> Any objections to that? >>>>>> >>>>>> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk >>>>>> <alena.prokharc...@citrix.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 4/2/14, 1:27 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >>>>>> >>>>>> I think we agree indeed. Doesn't mean we should start this discuss >>>>>> thread or write a arch guideline on the wiki somewhere. Maybe Ding >>>>>> Yuan wants to do a preliminary version of it? >>>>>> >>>>>> >>>>>> Wiki guide would be useful indeed. >>>>>> >>>>>> >>>>>> >>>>>> In the meantime I don't think that it hurts for the present patch to >>>>>> do everything in debug and decide about higher levels needed later. >>>>>> >>>>>> >>>>>> Agree. >>>>>> >>>>>> >>>>>> >>>>>> regards, >>>>>> >>>>>> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk >>>>>> <alena.prokharc...@citrix.com> wrote: >>>>>> >>>>>> Daan, >>>>>> >>>>>> Correct me if I¹m wrong, but all of the logging added by Ding, fall >>>>>> under >>>>>> "to go with it or to indicate passing a certain code path². I¹ve just >>>>>> noticed that some of them were added with DEBUG, and some with WARN >>>>>> level, >>>>>> and wanted to correct that. >>>>>> >>>>>> So we should: >>>>>> >>>>>> 1) For sure: never print them out in WARN as there is nothing admin >>>>>> should >>>>>> do in this case, because the code just handles them by ignoring. >>>>>> 2) Figure out what would be the correct level to log them with: INFO or >>>>>> DEBUG >>>>>> >>>>>> From ³Logging best practices² articles, I can see that people use INFO >>>>>> as >>>>>> a ³storyline² of normal application behavior, and DEBUG for sort of >>>>>> information that helps you to track down the failure cases scenarios. >>>>>> To >>>>>> me, stuff added by Ding, falls under second category. But I might be >>>>>> wrong >>>>>> as I don¹t recall on the spot any discussions happening on the debug >>>>>> topic, from the mailing list. >>>>>> >>>>>> -Alena. >>>>>> >>>>>> >>>>>> On 4/2/14, 12:57 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >>>>>> >>>>>> Alena, >>>>>> >>>>>> What I read in your comment is a description of INFO vs WARN. Debug >>>>>> would be only for outputting stacktraces to go with it or to indicate >>>>>> passing a certain code path. >>>>>> >>>>>> Agree? >>>>>> >>>>>> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk >>>>>> <alena.prokharc...@citrix.com> wrote: >>>>>> >>>>>> >>>>>> ----------------------------------------------------------- >>>>>> This is an automatically generated e-mail. To reply, visit: >>>>>> https://reviews.apache.org/r/19917/#review39324 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> >>>>>> Is there a reason why logs for some exceptions are being logged in >>>>>> DEBUG mode, and some in WARN? From my point of view, if the code only >>>>>> catches it and doesn't error out, it should be logged in DEBUG. Lots of >>>>>> Admins are seeking for WARN statements in the log, and they might be >>>>>> confused seeing WARN w/o further failure or retry. >>>>>> >>>>>> - Alena Prokharchyk >>>>>> >>>>>> >>>>>> On April 2, 2014, 1:55 p.m., Ding Yuan wrote: >>>>>> >>>>>> >>>>>> ----------------------------------------------------------- >>>>>> This is an automatically generated e-mail. To reply, visit: >>>>>> https://reviews.apache.org/r/19917/ >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> (Updated April 2, 2014, 1:55 p.m.) >>>>>> >>>>>> >>>>>> Review request for cloudstack. >>>>>> >>>>>> >>>>>> Repository: cloudstack-git >>>>>> >>>>>> >>>>>> Description >>>>>> ------- >>>>>> >>>>>> This is the patch for JIRA-6242. See >>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more >>>>>> details. >>>>>> Thanks! >>>>>> >>>>>> >>>>>> Diffs >>>>>> ----- >>>>>> >>>>>> >>>>>> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java >>>>>> 0d41bc1 >>>>>> >>>>>> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager >>>>>> Im >>>>>> pl.java 01508a4 >>>>>> >>>>>> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java >>>>>> 3e088db >>>>>> >>>>>> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit >>>>>> y/ >>>>>> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e >>>>>> engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f >>>>>> engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d >>>>>> engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java >>>>>> e42eaf4 >>>>>> engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java >>>>>> 34fdca5 >>>>>> engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java >>>>>> 58dd916 >>>>>> engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0 >>>>>> engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java >>>>>> 1f382d6 >>>>>> >>>>>> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM >>>>>> an >>>>>> agerImpl.java 6ed1274 >>>>>> >>>>>> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla >>>>>> ss >>>>>> Registry.java 83c8a42 >>>>>> >>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer >>>>>> ve >>>>>> rDiscoverer.java 0ad6dc4 >>>>>> >>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe >>>>>> rC >>>>>> onnectionPool.java b779085 >>>>>> >>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe >>>>>> rS >>>>>> torageProcessor.java e512046 >>>>>> >>>>>> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat >>>>>> as >>>>>> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a >>>>>> server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba >>>>>> server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8 >>>>>> >>>>>> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT >>>>>> hu >>>>>> mbnailHandler.java 06f21d3 >>>>>> utils/src/com/cloud/utils/net/NetUtils.java 6350986 >>>>>> >>>>>> Diff: https://reviews.apache.org/r/19917/diff/ >>>>>> >>>>>> >>>>>> Testing >>>>>> ------- >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Ding Yuan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Daan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Daan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Daan >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Daan >>>>> >>>> >>> >>> >>> >>> -- >>> Daan >>> >> > > > > -- > Daan >