[ 
https://issues.apache.org/jira/browse/TRAFODION-3308?focusedWorklogId=242946&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-242946
 ]

ASF GitHub Bot logged work on TRAFODION-3308:
---------------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/May/19 23:08
            Start Date: 15/May/19 23:08
    Worklog Time Spent: 10m 
      Work Description: DaveBirdsall commented on pull request #1840: 
[TRAFODION-3308] Improve error reporting when a process cannot be started
URL: https://github.com/apache/trafodion/pull/1840
 
 
   This pull request contains improvements to error reporting when a process 
server cannot be started. The following changes are made:
   
   1. Logic in the compiler (optimizer/NodeMap.cpp) to compute node names has 
been corrected. Thanks to @sandhyasun for this change.
   2. The correct node name is now reported in error messages 2012 and 2013 
(common/IpcGuardian.cpp). Formerly, a hard-coded value of "NSK" was reported.
   3. Redundant 2013 and 2002 error messages are no longer generated 
(common/IpcGuardian.cpp and cli/ExSqlComp.cpp).
   4. Interpretive text has been added to error 2012 to give a more 
human-understandable reason behind certain common error codes 
(common/IpcGuardian.cpp and bin/SqlciErrors.txt).
   5. The nodeName_ member in IpcGuardianServer is now computed at run time, 
and only if needed (common/IpcGuardian.cpp). (This happens during error 
reporting only.) This is done because the compile time value may be different 
in the case of ESPs due to node down conditions. Former logic that put "NSK" in 
this field has been removed.
   6. The function ComRtGetOSClusterName is an obsolete function that in 
predecessor products returned the cluster name. On Trafodion, it just returns 
the string "NSK". I removed much but not all of the logic that calls this 
function. This resulted in the removal of some dead code 
(executor/ExCancel.cpp, executor/ExExeUtilCommon.cpp).
   7. The method IpcServerClass::getProcessName required the node name in 
predecessor products but does not in Trafodion. I removed the node name and 
node name length parameters from this method and associated code 
(common/Ipc.cpp). I was motivated to do this because the old calling code 
sometimes took strlen(nodeName_) to get the node name length, but due to the 
changes above nodeName_ is now null in some cases.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 242946)
            Time Spent: 10m
    Remaining Estimate: 0h

> Uninformative error messages when executables are unavailable
> -------------------------------------------------------------
>
>                 Key: TRAFODION-3308
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-3308
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.4
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the tdm_arkcmp executable is unavailable on a Trafodion node due to a 
> file system error, we get the following less-than-informative error:
> {quote}>>create schema scythians;
> *** ERROR[2012] Server process tdm_arkcmp could not be created on \\NSK - 
> Operating system error 4022, TPCError = 53, error detail = 0.  (See variants 
> of Seabed procedure msg_mon_start_process for details).
> *** ERROR[2013] Server process tdm_arkcmp could not be created on \\NSK - 
> Operating system error 4022.
> *** ERROR[2002] Internal error: cannot create compiler.
> *** ERROR[8822] The statement was not prepared.
> --- SQL operation failed with errors.
> >> 
> {quote}
> We get a similarly uninformative series of errors when the tdm_arkesp 
> executable is unavailable:
> {quote}>>prepare s1 from select * From t1 where b = 6;
> --- SQL command prepared.
> >>explain options 'f' s1;
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------  ---------
> 2    .    3    root                                                  1.30E+004
> 1    .    2    esp_exchange                    1:2(hash2)            1.30E+004
> .    .    1    trafodion_scan                  T1                    1.30E+004
> --- SQL operation complete.
> >>execute s1;
> *** ERROR[2012] Server process tdm_arkesp could not be created on \NSK cpu 0 
> - Operating system error 4022, TPCError = 53, error detail = 0.  (See 
> variants of Seabed procedure msg_mon_start_process for details).
> *** ERROR[2013] Server process tdm_arkesp could not be created on \NSK cpu 0 
> - Operating system error 4022.
> *** ERROR[2012] Server process tdm_arkesp could not be created on \NSK cpu 0 
> - Operating system error 4022, TPCError = 53, error detail = 0.  (See 
> variants of Seabed procedure msg_mon_start_process for details).
> --- 0 row(s) selected.
> >> 
> {quote}
> Among the issues with these error messages:
>  # They do not give the correct node name where we were trying to create the 
> process, but instead report NSK.
>  # Error 2013 is completely redundant; everything it says is in error 2012.
>  # Error 2012 could be much more informative. Text could be added explaining 
> the meaning of the error codes given.
>  # In the tdm_arkcmp case, error 2002 adds no information at all.
> To reproduce these issues on a development instance, first create a table T1 
> with one million rows (so a parallel plan will be picked for the tdm_arkesp 
> example). Then go to the trafodion/core/sql/lib/linux/64bit/debug directory 
> and rename the tdm_arkcmp and tdm_arkesp executables to something else. Try 
> any DDL command to get the tdm_arkcmp failure. Try any parallel DML statement 
> to get the tdm_arkesp failure.
> It is likely that similar issues exist for other processes, e.g. tdm_udrserv.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to