[
https://issues.apache.org/jira/browse/HIVE-27881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated HIVE-27881:
--------------------------------
Description:
>From time-to-time, we face issues where only some runtime magic would help us
>investigate the problems, like agents or the aspect-oriented approach.
I can recall the following jiras:
HIVE-25806: socket leak, that was investigated finally by
https://github.com/jenkinsci/lib-file-leak-detector/
HIVE-26985: an idea about tracking Hive objects, that generated argument
HIVE-27875: a socket leak again, which then turned out to be solved by
HIVE-25736 upstream, I just missed this patch downstream
Basically, using an agent means 2 things:
1) having the agent jar on local filesystem wherever hive components run
2) adding a javaagent clause to the JVM options
2) should be possible anytime, that's how we configure our products, right? but
1) is simply not possible in containerized environments: even if I can create
an image + convince a customer to use that, that's a security concern, why
would they use an unknown/unofficial image contaminated by an unknown agent
(like lib-file-leak-detector above)
Using agents is a good way to instrument our code on-demand, and it's crucial
to make it easily pluggable, otherwise, we're gonna face performance problems
(guess what happens if you watch and instrument every single socket and save
their traces by default in your product :) )
I think a set of instrumentation functionalities can be added to a hive module,
which then leads to a hive-agents.jar, which by default can be included in any
hive component's JVM args. The javaagent command line args will then drive what
instrumentation we really want to turn on, like:
{code}
-javaagent:/lib/hive-agents-x.y.jar=socket-leak-detector,config-detector
{code}
was:
>From time-to-time, we face issues where only some runtime magic would help us
>investigate the problems, like agents or the aspect-oriented approach.
I can recall the following jiras:
HIVE-25806: socket leak, that was investigated finally by
https://github.com/jenkinsci/lib-file-leak-detector/
HIVE-26985: an idea about tracking Hive objects, that generated argument
HIVE-27875: a socket leak again, which then turned out to be solved by
HIVE-25736 upstream, I just missed this patch downstream
Basically, using an agent means 2 things:
1) having the agent jar wherever hive components run
2) adding a java agent clause to the JVM options
2) should be possible anytime, that's how we configure our products, but 1) is
simply not possible in containerized environments: even if I can create an
image + convince a customer to use that, that's a security concern, why would
they use an unknown/unofficial image contaminated by an unknown agent (like
lib-file-leak-detector above)
Using agents is a good way to instrument our code on-demand, and it's crucial
to make it easily pluggable, otherwise, we're gonna face performance problems
(guess what happens if you watch and instrument every single socket and save
their traces by default in your product :) )
I think a set of instrumentation functionalities can be added to a hive module,
which then leads to a hive-agents.jar, which by default can be included in any
hive component's JVM args. The javaagent command line args will then drive what
instrumentation we really want to turn on, like:
{code}
-javaagent:/lib/hive-agents-x.y.jar=socket-leak-detector,config-detector
{code}
> Introduce hive-agents module for trusted instrumentation
> --------------------------------------------------------
>
> Key: HIVE-27881
> URL: https://issues.apache.org/jira/browse/HIVE-27881
> Project: Hive
> Issue Type: Improvement
> Reporter: László Bodor
> Priority: Major
>
> From time-to-time, we face issues where only some runtime magic would help us
> investigate the problems, like agents or the aspect-oriented approach.
> I can recall the following jiras:
> HIVE-25806: socket leak, that was investigated finally by
> https://github.com/jenkinsci/lib-file-leak-detector/
> HIVE-26985: an idea about tracking Hive objects, that generated argument
> HIVE-27875: a socket leak again, which then turned out to be solved by
> HIVE-25736 upstream, I just missed this patch downstream
> Basically, using an agent means 2 things:
> 1) having the agent jar on local filesystem wherever hive components run
> 2) adding a javaagent clause to the JVM options
> 2) should be possible anytime, that's how we configure our products, right?
> but 1) is simply not possible in containerized environments: even if I can
> create an image + convince a customer to use that, that's a security concern,
> why would they use an unknown/unofficial image contaminated by an unknown
> agent (like lib-file-leak-detector above)
> Using agents is a good way to instrument our code on-demand, and it's crucial
> to make it easily pluggable, otherwise, we're gonna face performance problems
> (guess what happens if you watch and instrument every single socket and save
> their traces by default in your product :) )
> I think a set of instrumentation functionalities can be added to a hive
> module, which then leads to a hive-agents.jar, which by default can be
> included in any hive component's JVM args. The javaagent command line args
> will then drive what instrumentation we really want to turn on, like:
> {code}
> -javaagent:/lib/hive-agents-x.y.jar=socket-leak-detector,config-detector
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)