[jira] [Updated] (HIVE-27881) Introduce hive-agents module for trusted instrumentation

Jira Thu, 16 Nov 2023 08:11:17 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-27881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


László Bodor updated HIVE-27881:
--------------------------------
    Description: 
>From time-to-time, we face issues where only some runtime magic would help us 
>investigate the problems, like agents or the aspect-oriented approach.

I can recall the following jiras:
HIVE-25806: socket leak, that was investigated finally by 
https://github.com/jenkinsci/lib-file-leak-detector/
HIVE-26985: an idea about tracking Hive objects, that generated argument
HIVE-27875: a socket leak again, which then turned out to be solved by 
HIVE-25736 upstream, I just missed this patch downstream

Basically, using an agent means 2 things:
1) having the agent jar on local filesystem wherever hive components run
2) adding a javaagent clause to the JVM options

2) should be possible anytime, that's how we configure our products, right? but 
1) is simply not possible in containerized environments: even if I can create 
an image + convince a customer to use that, that's a security concern, why 
would they use an unknown/unofficial image contaminated by an unknown agent 
(like lib-file-leak-detector above)

Using agents is a good way to instrument our code on-demand, and it's crucial 
to make it easily pluggable, otherwise, we're gonna face performance problems 
(guess what happens if you watch and instrument every single socket and save 
their traces by default in your product :) )

I think a set of instrumentation functionalities can be added to a hive module, 
which then leads to a hive-agents.jar, which by default can be included in any 
hive component's JVM args. The javaagent command line args will then drive what 
instrumentation we really want to turn on, like:

{code}
 -javaagent:/lib/hive-agents-x.y.jar=socket-leak-detector,config-detector
{code}

  was:
>From time-to-time, we face issues where only some runtime magic would help us 
>investigate the problems, like agents or the aspect-oriented approach.

I can recall the following jiras:
HIVE-25806: socket leak, that was investigated finally by 
https://github.com/jenkinsci/lib-file-leak-detector/
HIVE-26985: an idea about tracking Hive objects, that generated argument
HIVE-27875: a socket leak again, which then turned out to be solved by 
HIVE-25736 upstream, I just missed this patch downstream

Basically, using an agent means 2 things:
1) having the agent jar wherever hive components run
2) adding a java agent clause to the JVM options

2) should be possible anytime, that's how we configure our products, but 1) is 
simply not possible in containerized environments: even if I can create an 
image + convince a customer to use that, that's a security concern, why would 
they use an unknown/unofficial image contaminated by an unknown agent (like 
lib-file-leak-detector above)

Using agents is a good way to instrument our code on-demand, and it's crucial 
to make it easily pluggable, otherwise, we're gonna face performance problems 
(guess what happens if you watch and instrument every single socket and save 
their traces by default in your product :) )

I think a set of instrumentation functionalities can be added to a hive module, 
which then leads to a hive-agents.jar, which by default can be included in any 
hive component's JVM args. The javaagent command line args will then drive what 
instrumentation we really want to turn on, like:

{code}
 -javaagent:/lib/hive-agents-x.y.jar=socket-leak-detector,config-detector
{code}


> Introduce hive-agents module for trusted instrumentation
> --------------------------------------------------------
>
>                 Key: HIVE-27881
>                 URL: https://issues.apache.org/jira/browse/HIVE-27881
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Priority: Major
>
> From time-to-time, we face issues where only some runtime magic would help us 
> investigate the problems, like agents or the aspect-oriented approach.
> I can recall the following jiras:
> HIVE-25806: socket leak, that was investigated finally by 
> https://github.com/jenkinsci/lib-file-leak-detector/
> HIVE-26985: an idea about tracking Hive objects, that generated argument
> HIVE-27875: a socket leak again, which then turned out to be solved by 
> HIVE-25736 upstream, I just missed this patch downstream
> Basically, using an agent means 2 things:
> 1) having the agent jar on local filesystem wherever hive components run
> 2) adding a javaagent clause to the JVM options
> 2) should be possible anytime, that's how we configure our products, right? 
> but 1) is simply not possible in containerized environments: even if I can 
> create an image + convince a customer to use that, that's a security concern, 
> why would they use an unknown/unofficial image contaminated by an unknown 
> agent (like lib-file-leak-detector above)
> Using agents is a good way to instrument our code on-demand, and it's crucial 
> to make it easily pluggable, otherwise, we're gonna face performance problems 
> (guess what happens if you watch and instrument every single socket and save 
> their traces by default in your product :) )
> I think a set of instrumentation functionalities can be added to a hive 
> module, which then leads to a hive-agents.jar, which by default can be 
> included in any hive component's JVM args. The javaagent command line args 
> will then drive what instrumentation we really want to turn on, like:
> {code}
>  -javaagent:/lib/hive-agents-x.y.jar=socket-leak-detector,config-detector
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27881) Introduce hive-agents module for trusted instrumentation

Reply via email to