Thanks both, Using yet another GFSH command (*export stack-traces*) when the system is already a distributed deadlock doesn't seem right :-). On the other hand, *jstack* and *jcmd* both work as expected, but my goal was to make *KILL -QUIT / KILL -3* work, it's easier and widely known. I'll keep working on it. Best regards.
On Wed, Dec 20, 2017 at 3:38 PM, Kirk Lund <kl...@apache.org> wrote: > Sorry, my info on redirecting of stdout and stderr was stale (circa GemFire > 7.0). Now that the code base is an Apache project, I believe redirecting of > stdout and stderr no longer occurs. However, it does shut off the Log4j2 > ConsoleAppender to stdout when the LogWriterAppender is activated and turns > it back on when the LogWriterAppender is closed. The old code base > redirected stdout and stderr using native code which is not in Apache > Geode. > > If you want to dump stack traces, you need to use the GFSH command "export > stack-traces". > > On Wed, Dec 20, 2017 at 1:33 AM, Juan José Ramos <jra...@pivotal.io> > wrote: > > > Hello Kirk, > > > > Thanks for your reply. Please find my answers below, in line for an > easier > > follow-up. > > > > *>> By default Geode redirects stdout and stderr to the value of the > > log-file property when the Cache starts. If log-file is set to "" then it > > won't redirect them.* > > > > This doesn't seem to be true at the moment, I've tested the behavior > > before sending the email and creating the JIRA. If you start a basic > server > > with the defaults + *log-file*, the output written from a > > function/listener to stdout/stderr is lost, nothing is printed on the > > member's logs. > > > > > > *>> So we reluctantly made the code redirect stdout and stderr when > > LogWriterAppender is registered (at the same point in Cache startup where > > the behavior originally existed). If you specify a custom log4j2.xml then > > this behavior does NOT happen.* > > > > Also tested this one and it doesn't seem to be true, the result is the > same > > as before: the output written to *stderr/stdout* is lost, nothing is > > printed on the member's logs. > > > > > > *>> So a user currently has two options:* > > *>> 1) keep the old behavior,* > > *>> 2) use a custom log4j2.xml which can be specified using the Log4J 2 > > system property "log4j.configurationFile".* > > > > The attached zip file contains a reproducible scenario (you just need to > > modify *setenv.txt* and execute *reproduce.sh*) showing that both > > approaches fail, the output from *System.out.println()* and > > *System.err.println()* is lost, along with the thread dump generated by > *kill > > -3 PID*. > > Hope this is clear now, hope you can answer the questions from my > previous > > email so I can proceed with the fix. > > Best regards > > > > > > On Tue, Dec 19, 2017 at 2:56 PM, Kirk Lund <kl...@apache.org> wrote: > > > >> By default Geode redirects stdout and stderr to the value of the > log-file > >> property when the Cache starts. If log-file is set to "" then it won't > >> redirect them. > >> > >> When we changed GemFire to use Log4J2 internally, we tried to leave > stdout > >> and stderr alone such that the LogWriterAppender would append log > >> statements to the log-file and it would leave the stdout appender > >> registered and appending to stdout. This was a change in behavior from > the > >> previous version of GemFire, and the management at that time decided > that > >> it needed to maintain the old behavior, so we reluctantly made the code > >> redirect stdout and stderr when LogWriterAppender is registered (at the > >> same point in Cache startup where the behavior originally existed). > >> > >> If you specify a custom log4j2.xml then this behavior does NOT happen. > So > >> a > >> User currently has two options: 1) keep the old behavior, 2) use a > custom > >> log4j2.xml which can be specified using the Log4J 2 system property > >> "log4j.configurationFile". > >> > >> If you copy geode-core/src/main/resources/log4j2.xml for use as the > >> starting basis for your log4j2.xml then you must NOT keep this line (or > >> set > >> it to false): > >> > >> <Property name="geode-default">true</Property> > >> > >> That is the line that Geode uses to determine that it is using its own > >> default log4j2.xml and this enables redirecting of stdout and stderr > when > >> the LogWriterAppender starts up. > >> > >> On Tue, Dec 19, 2017 at 7:11 AM, Ju@N <jujora...@gmail.com> wrote: > >> > >> > Hello Geode devs, > >> > > >> > Currently GEODE is "swallowing" all output sent to stdout and stderr > by > >> > default and there's no way of changing this behavior when starting > >> members > >> > through *gfsh*. > >> > This, between other things, prevents users, between other things, from > >> > playing around with *System.out.println()* during development phases > and > >> > getting thread dumps by executing a plain *kill -3* or *kill -QUIT* > >> using > >> > the processId, which is critical in troubleshooting. > >> > Currently there are two internal flags that can be used to prevent > this > >> > default behavior, both have to be used at the same time and both are > >> very > >> > counterintuitive: *gemfire.OSProcess.ENABLE_OUTPUT_REDIRECTION=true* > >> and > >> > *gemfire.OSProcess.DISABLE_OUTPUT_REDIRECTION=false*. These flags, > >> > however, > >> > don't work when starting members through *gfsh*, and that's because > the > >> > relevant commands wrongly assume that the flags are already part of > the > >> > system properties too early in the lifecycle execution of the command: > >> > > >> > > >> > *StartXXXXXCommand.java* > >> > > >> > @CliCommand(value = CliStrings.START_XXXXX, help = > >> > CliStrings.START_XXXXX__HELP) > >> > > >> > @CliMetaData(shellOnly = true, relatedTopic = > >> > {CliStrings.TOPIC_GEODE_XXXXX, CliStrings.TOPIC_GEODE_LIFECYCLE}) > >> > public Result startXXXXX(...) throws Exception { > >> > (...) > >> > final boolean redirectOutput = > >> > Boolean.getBoolean(OSProcess.ENABLE_OUTPUT_REDIRECTION_PROPERTY); > >> > XXXXXLauncher.Builder serverXXXXXBuilder = > >> > new XXXXXLauncher.Builder() > >> > .setRedirectOutput(redirectOutput) > >> > (...) > >> > > >> > } > >> > > >> > At this stage during the execution, the system properties used when > >> > starting the members haven't been fully parsed yet and the flags are > >> only > >> > present within the sun.java.command system property, so > >> > *Boolean.getBoolean(OSProcess.ENABLE_OUTPUT_REDIRECTION_PROPERTY)* > will > >> > always return *false*. There's a JIRA created with this same > >> description, > >> > and I've started to work on a fix for it: GEODE-4101 > >> > <https://issues.apache.org/jira/browse/GEODE-4101>. > >> > > >> > The proposal would be to add a new flag, *--redirect-ouput*, to the > >> start > >> > commands in GFSH and deprecate the properties > >> > *OSProcess.DISABLE_OUTPUT_REDIRECTION* and > >> > *OSProcess.ENABLE_OUTPUT_REDIRECTION*. To avoid major code changes > the > >> > start commands will have this new flag as a parameter and will also > set > >> as > >> > *true* a new internal system property > >> > *OSProcess.DISABLE_REDIRECTION_CONFIGURATION* which, as it names > >> implies, > >> > will disable the other two properties when set. In the next major > >> release, > >> > the three properties should be deleted without major changes. Do you > see > >> > any flaws here?. > >> > > >> > I've tested these changes and the output from *System.out.println()* > >> (from > >> > a function or listener, as an example) goes to the member's log file > as > >> > expected. However, no matter what I do, I can't get the output from > >> *kill > >> > -3 / kill -QUIT*, nor can I find a place within the source code where > >> this > >> > signal is caught to explain why the thread dump is not printed in the > >> > member's log file. Am I missing something?. > >> > > >> > Last, but not least, when redirecting *stdout/stderr* within a locator > >> with > >> > pulse embedded, all of the deploy steps get logged using a different > >> format > >> > (*this was being swallowed before*): > >> > > >> > ... > >> > [info 2017/12/19 11:12:12.123 ART locator1 <main> tid=0x1] > >> > Initializing Spring root WebApplicationContext > >> > Dec 19, 2017 11:12:12 AM org.springframework.web. > context.ContextLoader > >> > initWebApplicationContext > >> > INFO: Root WebApplicationContext: initialization started > >> > Dec 19, 2017 11:12:12 AM > >> > org.springframework.web.context.support.XmlWebApplicationContext > >> > prepareRefresh > >> > INFO: Refreshing Root WebApplicationContext: startup date [Tue Dec 19 > >> > 11:12:12 ART 2017]; root of context hierarchy > >> > Dec 19, 2017 11:12:12 AM > >> > org.springframework.beans.factory.xml.XmlBeanDefinitionReader > >> > loadBeanDefinitions > >> > INFO: Loading XML bean definitions from ServletContext resource > >> > [/WEB-INF/mvc-dispatcher-servlet.xml] > >> > Dec 19, 2017 11:12:12 AM > >> > org.springframework.beans.factory.xml.XmlBeanDefinitionReader > >> > loadBeanDefinitions > >> > INFO: Loading XML bean definitions from ServletContext resource > >> > [/WEB-INF/spring-security.xml] > >> > ... > >> > > >> > This probably happens because jetty uses *StdErrLog* by default and > >> > *log4j2* gets > >> > reconfigured using the *log4j2.xml* file from pulse (ignoring the > format > >> > and options defined by *org.apache.geode.internal. > logging.LogService*). > >> > What would be the recommended approach here?, add *geode-core* as a > >> compile > >> > dependency of *geode-pulse *and directly use *LogService* instead of > the > >> > default *LogManager*?, define a custom *LogService* in *geode-pulse > *to > >> > check (JMX maybe?) whether there's a parent context defined already > and > >> use > >> > it instead of *LogManager*?, tweak *JettyHelper* to, somehow, threat > >> > *pulse* differently > >> > and disable this deploy logging as it happens today?, leave it as it > is > >> and > >> > create a new JIRA to address this separately (maybe moving the > internal > >> > logging to a separate module)?. > >> > Best regards > >> > > >> > > > > > > > > -- > > Juan José Ramos Cassella > > Senior Technical Support Engineer > > Email: jra...@pivotal.io > > Office#: +353 21 4238611 <+353%2021%20423%208611> > > Mobile#: +353 87 2074066 <+353%2087%20207%204066> > > After Hours Contact#: +1 877 477 2269 <(877)%20477-2269> > > Office Hours: Mon - Thu 08:30 - 17:00 GMT. Fri 08:30 - 16:00 GMT > > How to upload artifacts: https://support.pivotal.io/hc/ > > en-us/articles/204369073 > > How to escalate a ticket: https://support.pivotal.io/hc/ > > en-us/articles/203809556 > > > > [image: support] <https://support.pivotal.io/> [image: twitter] > > <https://twitter.com/pivotal> [image: linkedin] > > <https://www.linkedin.com/company/3048967> [image: facebook] > > <https://www.facebook.com/pivotalsoftware> [image: google plus] > > <https://plus.google.com/+Pivotal> [image: youtube] > > <https://www.youtube.com/playlist?list=PLAdzTan_ > eSPScpj2J50ErtzR9ANSzv3kl> > > > -- Juan José Ramos Cassella Senior Technical Support Engineer Email: jra...@pivotal.io Office#: +353 21 4238611 Mobile#: +353 87 2074066 After Hours Contact#: +1 877 477 2269 Office Hours: Mon - Thu 08:30 - 17:00 GMT. Fri 08:30 - 16:00 GMT How to upload artifacts: https://support.pivotal.io/hc/en-us/articles/204369073 How to escalate a ticket: https://support.pivotal.io/hc/en-us/articles/203809556 [image: support] <https://support.pivotal.io/> [image: twitter] <https://twitter.com/pivotal> [image: linkedin] <https://www.linkedin.com/company/3048967> [image: facebook] <https://www.facebook.com/pivotalsoftware> [image: google plus] <https://plus.google.com/+Pivotal> [image: youtube] <https://www.youtube.com/playlist?list=PLAdzTan_eSPScpj2J50ErtzR9ANSzv3kl>