[JPR] Hi Dave, Tim,
This kind of crash is always difficult to track down, for it is not easily reproductible. From what I see (and as Tim pointed) it seems there is a memory problem that is revelated in the process LabProjects List. But a memory problem can occur a while before the actual crash, because the application may have a corrupted memory and not be aware of it until the crash. - Is your application compiled? If yes, be sure that the Range checking option is set. - Is the LabProjects ListProcess a client process on server, or a worker or process running on the server? - The time of crash seems irrelevant, but may be it's linked to a peak in activity and a server or network stress? - A client problem causing a server crash is unlikely, but it may help to know if there is a correlation between the crash and a particular client doing a particular operation. - Do you know which method is executed when it crashes? - Do you use interprocess variables like arrays for instance? - How much memory has been given to the server and to the cache? This is just a short list of points to check, but it may help to reduce the problem to a small part of the application. My very best, JPR > On 2 Sep 2018, at 21:00, 4d_tech-requ...@lists.4d.com wrote: > > From: Tim Nevels <timnev...@mac.com> > To: 4d_tech@lists.4d.com > Subject: Re: Isolating the Cause of a Server Crash > Message-ID: <be3bf13d-9f79-4715-aadf-240c4c189...@mac.com> > Content-Type: text/plain; charset=utf-8 > > On Sep 1, 2018, at 2:00 PM, Dave Nasralla wrote: > >> One of our systems is crashing about every 3 days and I can't seem to >> isolate the cause. Lately these are crashes with a Mac crash report >> appearing on the screen. >> Some system details are: >> - 4D Built Server app with v17.0 HF1 (64 bit Server with 64 Mac and >> 32 bit Windows Clients) >> - Mac and Windows Clients >> - Mac OS 10.13.5 >> >> What I know so far: >> - I have the Server Debug file. It ends with a "." and so the last >> command appears to have executed. >> - I'm using the Report Info component, logging every 5 minutes. There >> doesn't seem to be memory problems or run away cache issues. >> - I also know who was one each time it crashes and said out an email >> to those users to find patterns (so far I've found none). >> - The crashes typically happen around 10am to 11am. >> - The client and server builds match. >> >> I'm debating turning on the client debugger files and then harvesting >> them afterwards when the user logs back in. I'm open to other >> debugging techniques. >> >> There are other v17 systems running on the same machine with zero issue. >> >> Below is a snippet of the crash report. It seems to be different each >> time, but here is the latest. Thread 73 crashed, so I only included >> that one. >> >> Thanks, >> >> dave nasralla >> ------------------------------------ >> Process: Corporate [93958] >> Path: /Users/USER/*/Corporate >> Server.app/Contents/MacOS/Corporate >> Identifier: 4d.com.Corporate Server.app >> Version: 17.0 build 17.226566 (???) >> Code Type: X86-64 (Native) >> Parent Process: ??? [1] >> Responsible: Corporate [93958] >> User ID: 501 >> >> Date/Time: 2018-08-31 11:00:05.952 -0500 >> OS Version: Mac OS X 10.13.5 (17F77) >> Report Version: 12 >> Anonymous UUID: 723511FD-4CA0-6E8B-0642-883209248DFC >> >> >> Time Awake Since Boot: 3700000 seconds >> >> System Integrity Protection: enabled >> >> Crashed Thread: 73 LabProjects List (id = -114) >> >> Exception Type: EXC_BAD_ACCESS (SIGSEGV) >> Exception Codes: EXC_I386_GPFLT >> Exception Note: EXC_CORPSE_NOTIFY >> >> Termination Signal: Segmentation fault: 11 >> Termination Reason: Namespace SIGNAL, Code 0xb >> Terminating Process: exc handler [0] >> ---------------------------------------------------------- >> >> >> Thread 73 Crashed:: LabProjects List (id = -114) >> 0 4d.com.Corporate Server.app 0x000000010694fdbe >> V4DConnection::OnPostpone(bool) + 40 >> 1 4d.com.Corporate Server.app 0x0000000106b095f7 >> V4DServerUser::PostponeServiceConnection() + 35 >> 2 4d.com.Corporate Server.app 0x0000000106b20567 >> V4DServer::exec_ConnectionPostpone(V4DRequestReply&, V4DTaskConcrete*, >> short) + 395 >> 3 4d.com.Corporate Server.app 0x0000000106b211ca >> V4DServer::exec_streamreq(V4DRequestReply&, V4DTaskConcrete*) + 100 > > Hi Dave, > > Crashing every 3 days is a real problem and totally unacceptable. So what can > be done to try and make this situation better? We need to make changes to > make this crashing stop. But what changes? > > Here is my thinking as I read this crash report. Keep in mind I’m not an > expert on this, so I may be wrong in some areas. If I am wrong hopefully > those that know more can correct me — and in turn help me and others > understand more about how to read these macOS crash reports. (Thinking about > Miyako, JPR, Christian Sakowski and Rob Laveaux — they are real experts in > this area. Real macOS programmers that know how to read these things > properly.) > > The crash report is supposed to provide a programmer with information on > exactly here the program crashed and the cause of the crash. If you have the > special 4D “debug” version it will contain more “symbols” and thus when 4D > crashes you get better names for functions instead of just memory address > offset. I think you even get 4D command names that were involved in the > crash. But the basic crash dump info that we have here can help point to the > general area of concern. Here is a website that helps explain crash dumps and > how to read them: > > https://www.maketecheasier.com/read-macos-crash-reports-troubleshoot-mac/ > > This is 4D v17.0 build 226566 that is running compiled in 64bit mode (Code > Type: x86-64). So first thought is that this could be a 4D 64bit issue. > That’s important because some of the code is completely different between > 32bit 4D and 64bit 4D. The 64bit code could be newly written code, the 32bit > code could be legacy code that has been around for years. > > Thread 73 “LabProjects List” is what crashed. Do you have a table named > “LabProjects” or maybe a MODIFY SELECTION or a listbox window that shows > records in this table? Or a process that has that name? Makes me think that > you do. That’s another pointer to where in your application the crashing > problem occurred. > > Exception Type is "EXC_BAD_ACCESS (SIGSEGV)” and that means "the program > attempts to access memory incorrectly or with an invalid address”. Could be a > C pointer that went bad or something doing with virtual memory or even how 4D > allocates its own memory internally. Could be 4D data cache related. > Basically 4D tried to access memory is was not allowed to access and macOS > killed 4D so that it could not damage other parts of the system and cause > them to crash. Thank you macOS for watching out and protecting us from > complete system corruption and crashing. Windows does this too. > > The last area is where we can see exactly where in 4D — and even the 4D C or > Objective C function name — that was running when macOS said “enough, this > application has gone crazy, I need to kill it before it does damage to other > applications.” The functions are listed in reverse chronological order, so > the one at the bottom is where the “call chain” started. The one at the top > is where it died. > > The function name is "V4DConnection::OnPostpone(bool)” and at the code at 40 > bytes from the start of that function is where the offending memory address > statement occurred. The name “V4DConnection” makes me think this is related > to networking, 4D Server handling network actions with 4D Client. The > “OnPostpone” makes me think this is somehow related to sleeping or a 4D > Client connection that has been asleep and needs to now wake up. And lastly > it make me think “this is related to the new network layer code”. Again, this > is just my thinking. I could be completely wrong about all of this. > > So now my brain tries to build a scenario that could most likely happen that > could be connected to this situation. Happens during the day between 10am and > 11am. It’s a work day with users connected. People came in to work got > connected to 4D Server, then wandered off to a meeting or something and their > computer went to sleep. You are using 4D Server compiled 64bit so you MUST be > using the new network layer. Legacy is only available in 32bit compiled 4D > Server macOS. > > There is this new network layer feature where if a 4D Client machine goes > into sleep mode you don’t lose your 4D Server connection. So that when the > user wakes up the 4D Client machine it notifies 4D Server and the old network > connection is reenergized and brought back to life. That “OnPostpone” mention > above makes me think this also. Maybe something went wrong in that area of > 4D. It is a tricky area because sleep could last for hours or days and memory > could be moved around and pointer can easily go bad in those type of > situations. > > So there is my analysis. Now what changes could you make to stop these damn > crashing situations? Here are some idea: > > - You say it happens about every 3 days, so just restart 4D Server every > single day. Giant PITA I know. But just an idea for what to do now to > eliminate the crashing. > > - Stop all 4D Client machines from sleeping. You’d have to physically go to > every machine and turn off system sleeping and allow the display to go to > sleep. You can’t rely on users to do this, and do it right. This is what I > would do, if I had physical access to all the machine — or at least RDP > access — so that I could make sure every machine had system sleep turned off. > (Of course you already have App Napping turned off on the 4D Server machine > so that’s not part of this issue, right?) > > - Crash dump lists Build Number 226566. v17.0 has build 225365. v17.0 HF1 has > build 226237. A quick check of 4D forums “Nightly Builds 4D v17” shows this > build is from 8/22/18. So you are running a nightly build. I’m guessing you > used v17.0 and had problems, went to v17.0 HF1 and still had problems, so you > went to nightly builds to try and find a fix. Maybe you keep doing that. > Current nightly build is 226837. You may find they’ve fixed the bug that is > biting you. > > - Stop using the new network layer. You would have to stop using 64bit 4D > Server so the many not be a viable option. You are limited to a 2GB data > cache. But maybe if you can stop the crashing now it worth that limitation. > That means compiling a 32bit version of 4D Server and 4D Client, and > replacing all the 64bit 4D Client applications with the 32bit version. I > think you could use the auto client update feature to automate this. ********************************************************************** 4D Internet Users Group (4D iNUG) Archive: http://lists.4d.com/archives.html Options: https://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **********************************************************************