[ https://issues.apache.org/jira/browse/HIVE-27881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor updated HIVE-27881: -------------------------------- Description: >From time-to-time, we face issues where only some runtime magic would help us >investigate the problems, like agents or the aspect-oriented approach. I can recall the following jiras: HIVE-25806: socket leak, that was investigated finally by https://github.com/jenkinsci/lib-file-leak-detector/ HIVE-26985: an idea about tracking Hive objects, that generated an argument about how to achieve that HIVE-27875: a socket leak again, which then turned out to be solved by HIVE-25736 upstream, I just missed this patch downstream Basically, using an agent means 2 things: 1) having the agent jar on local filesystem wherever hive components run 2) adding a javaagent clause to the JVM options 2) should be possible anytime, that's how we configure our products, right? but 1) is simply not possible in containerized environments: even if I can create an image + convince a customer to use that, that's a security concern, why would they use an unknown/unofficial image contaminated by an unknown agent (like lib-file-leak-detector above) Using agents is a good way to instrument our code on-demand, and it's crucial to make it easily pluggable, otherwise, we're gonna face performance problems (guess what happens if you watch and instrument every single socket and save their traces by default in your product :) ) I think a set of instrumentation functionalities can be added to a hive module, which then leads to a hive-agents.jar, which by default can be included in any hive component's JVM args. The javaagent command line args will then drive what instrumentation we really want to turn on, like: {code} -javaagent:/lib/hive-agents-x.y.jar=socket-leak-detector,config-detector {code} was: >From time-to-time, we face issues where only some runtime magic would help us >investigate the problems, like agents or the aspect-oriented approach. I can recall the following jiras: HIVE-25806: socket leak, that was investigated finally by https://github.com/jenkinsci/lib-file-leak-detector/ HIVE-26985: an idea about tracking Hive objects, that generated argument HIVE-27875: a socket leak again, which then turned out to be solved by HIVE-25736 upstream, I just missed this patch downstream Basically, using an agent means 2 things: 1) having the agent jar on local filesystem wherever hive components run 2) adding a javaagent clause to the JVM options 2) should be possible anytime, that's how we configure our products, right? but 1) is simply not possible in containerized environments: even if I can create an image + convince a customer to use that, that's a security concern, why would they use an unknown/unofficial image contaminated by an unknown agent (like lib-file-leak-detector above) Using agents is a good way to instrument our code on-demand, and it's crucial to make it easily pluggable, otherwise, we're gonna face performance problems (guess what happens if you watch and instrument every single socket and save their traces by default in your product :) ) I think a set of instrumentation functionalities can be added to a hive module, which then leads to a hive-agents.jar, which by default can be included in any hive component's JVM args. The javaagent command line args will then drive what instrumentation we really want to turn on, like: {code} -javaagent:/lib/hive-agents-x.y.jar=socket-leak-detector,config-detector {code} > Introduce hive-agents module for trusted instrumentation > -------------------------------------------------------- > > Key: HIVE-27881 > URL: https://issues.apache.org/jira/browse/HIVE-27881 > Project: Hive > Issue Type: Improvement > Reporter: László Bodor > Priority: Major > > From time-to-time, we face issues where only some runtime magic would help us > investigate the problems, like agents or the aspect-oriented approach. > I can recall the following jiras: > HIVE-25806: socket leak, that was investigated finally by > https://github.com/jenkinsci/lib-file-leak-detector/ > HIVE-26985: an idea about tracking Hive objects, that generated an argument > about how to achieve that > HIVE-27875: a socket leak again, which then turned out to be solved by > HIVE-25736 upstream, I just missed this patch downstream > Basically, using an agent means 2 things: > 1) having the agent jar on local filesystem wherever hive components run > 2) adding a javaagent clause to the JVM options > 2) should be possible anytime, that's how we configure our products, right? > but 1) is simply not possible in containerized environments: even if I can > create an image + convince a customer to use that, that's a security concern, > why would they use an unknown/unofficial image contaminated by an unknown > agent (like lib-file-leak-detector above) > Using agents is a good way to instrument our code on-demand, and it's crucial > to make it easily pluggable, otherwise, we're gonna face performance problems > (guess what happens if you watch and instrument every single socket and save > their traces by default in your product :) ) > I think a set of instrumentation functionalities can be added to a hive > module, which then leads to a hive-agents.jar, which by default can be > included in any hive component's JVM args. The javaagent command line args > will then drive what instrumentation we really want to turn on, like: > {code} > -javaagent:/lib/hive-agents-x.y.jar=socket-leak-detector,config-detector > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)