[ https://issues.apache.org/jira/browse/KYLIN-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yaguang Jia updated KYLIN-5700: ------------------------------- Description: h2. Background In the current code, there are many scenarios where a cmd needs to be spliced and then executed by ProcessBuilder. The parameters of the spliced cmd may come from the interface, and there is a lack of parameter legitimacy checking, which may be vulnerable to malicious attacks. When splicing spark commands, the {{checkCommandInjection}} method is used to avoid injection attacks, but it only avoids injection attacks caused by backquotes and $(), such as {{{}`rm -rf /` $(rm -rf /){}}}, but not other scenarios, such as {{cat nohup.out2 && echo success || echo failed echo failed}} h2. Fix Design Parameter checking when splicing cmd commands, including the following four scenarios: 1. diagnostic package, it will splice the parameters of the diag.sh script, such as project, jobId, path, etc. It will check each parameter in turn, and if it matches{{ ^[a-zA-Z0-9_. /-]+$ }}is enough 2. When exporting influxDB data, it will splice the database address and database name as the parameter of influx command, the former meets{{ [a-zA-Z0-9._-](:[0-9])?}} and {{^[0-9a-zA-Z_-]+$}} for the latter. 3. When fetching yarn's stats, the url of yarn is spliced as an argument to the curl command, conforming to{{ ^(http(s)? ://)? [a-zA-Z0-9._-](:[0-9])? (/[a-zA-Z0-9._-]+)*/? $}} That's it. 4. When executing the beeline command, the beeline-params in the configuration will be spliced into the command. The composition of the beeline-params is more complicated, forcing each parameter value to be converted to a string by wrapping it with ', such as {{abc → 'abc', ab'c → 'ab'\''c'}} h2. Background 在当前代码中,有众多场景需要拼接出一条 cmd, 然后通过 {{ProcessBuilder}} 来执行,拼接cmd的参数可能会来自接口,并且缺少参数合法性的检验,有被恶意攻击的可能。 当拼接 spark 命令时,使用了 {{checkCommandInjection}} 方法来避免注入攻击,但是该方法仅规避了 反引号 和 $() 导致的注入攻击,如 {{`rm -rf /`}} {{{}$(rm -rf /){}}},无法规避其他场景,如 {{cat nohup.out2 && echo success || echo failed}} h2. Fix Design 在拼接cmd命令时对参数进行检查,包括以下四种场景: # 打诊断包时,会拼接 diag.sh 脚本的参数,如项目、jobId、路径等,依次检查每一个参数,符合 {{^[a-zA-Z0-9_./-]+$}} 即可 # 导出influxDB 数据时,会在命令里拼接 *数据库地址* 以及 {*}数据库名称{*}作为 influx命令的参数,前者符合 {{[a-zA-Z0-9._-]{+}(:[0-9]{+})?}} 即可,后者符合{{{}^[0-9a-zA-Z_-]+${}}} 即可 # 获取yarn的统计指标时,会拼接yarn 的url地址作为 curl 命令的参数,符合 {{^(http(s)?://)?[a-zA-Z0-9._-]{+}(:[0-9]{+})?(/[a-zA-Z0-9._-]+)*/?$}} 即可 # 执行 beeline 命令时,会将配置中的 beeline-params 拼接到命令中,beeline-params的构成较为复杂,强制将每一个参数值使用{{{}'{}}}包起来转为字符串,如 abc → ‘abc',ab’c → ‘ab’\''c' was: h2. Background 在当前代码中,有众多场景需要拼接出一条 cmd, 然后通过 {{ProcessBuilder}} 来执行,拼接cmd的参数可能会来自接口,并且缺少参数合法性的检验,有被恶意攻击的可能。 当拼接 spark 命令时,使用了 {{checkCommandInjection}} 方法来避免注入攻击,但是该方法仅规避了 反引号 和 $() 导致的注入攻击,如 {{`rm -rf /`}} {{{}$(rm -rf /){}}},无法规避其他场景,如 {{cat nohup.out2 && echo success || echo failed}} h2. Fix Design 在拼接cmd命令时对参数进行检查,包括以下四种场景: # 打诊断包时,会拼接 diag.sh 脚本的参数,如项目、jobId、路径等,依次检查每一个参数,符合 {{^[a-zA-Z0-9_./-]+$}} 即可 # 导出influxDB 数据时,会在命令里拼接 *数据库地址* 以及 {*}数据库名称{*}作为 influx命令的参数,前者符合 {{[a-zA-Z0-9._-]{+}(:[0-9]{+})?}} 即可,后者符合{{{}^[0-9a-zA-Z_-]+${}}} 即可 # 获取yarn的统计指标时,会拼接yarn 的url地址作为 curl 命令的参数,符合 {{^(http(s)?://)?[a-zA-Z0-9._-]{+}(:[0-9]{+})?(/[a-zA-Z0-9._-]+)*/?$}} 即可 # 执行 beeline 命令时,会将配置中的 beeline-params 拼接到命令中,beeline-params的构成较为复杂,强制将每一个参数值使用{{{}'{}}}包起来转为字符串,如 abc → ‘abc',ab’c → ‘ab’\''c' > Command line injection vulnerability when generating diagnostic packages via > scripts > ------------------------------------------------------------------------------------ > > Key: KYLIN-5700 > URL: https://issues.apache.org/jira/browse/KYLIN-5700 > Project: Kylin > Issue Type: Bug > Components: Tools, Build and Test > Affects Versions: 5.0-alpha > Reporter: Yaguang Jia > Assignee: Yaguang Jia > Priority: Critical > Fix For: 5.0-beta > > > h2. Background > In the current code, there are many scenarios where a cmd needs to be spliced > and then executed by ProcessBuilder. The parameters of the spliced cmd may > come from the interface, and there is a lack of parameter legitimacy > checking, which may be vulnerable to malicious attacks. > When splicing spark commands, the {{checkCommandInjection}} method is used to > avoid injection attacks, but it only avoids injection attacks caused by > backquotes and $(), such as {{{}`rm -rf /` $(rm -rf /){}}}, but not other > scenarios, such as {{cat nohup.out2 && echo success || echo failed echo > failed}} > h2. Fix Design > Parameter checking when splicing cmd commands, including the following four > scenarios: > 1. diagnostic package, it will splice the parameters of the diag.sh script, > such as project, jobId, path, etc. It will check each parameter in turn, and > if it matches{{ ^[a-zA-Z0-9_. /-]+$ }}is enough > 2. When exporting influxDB data, it will splice the database address and > database name as the parameter of influx command, the former meets{{ > [a-zA-Z0-9._-](:[0-9])?}} and {{^[0-9a-zA-Z_-]+$}} for the latter. > 3. When fetching yarn's stats, the url of yarn is spliced as an argument to > the curl command, conforming to{{ ^(http(s)? ://)? [a-zA-Z0-9._-](:[0-9])? > (/[a-zA-Z0-9._-]+)*/? $}} That's it. > 4. When executing the beeline command, the beeline-params in the > configuration will be spliced into the command. The composition of the > beeline-params is more complicated, forcing each parameter value to be > converted to a string by wrapping it with ', such as {{abc → 'abc', ab'c → > 'ab'\''c'}} > h2. Background > 在当前代码中,有众多场景需要拼接出一条 cmd, 然后通过 {{ProcessBuilder}} > 来执行,拼接cmd的参数可能会来自接口,并且缺少参数合法性的检验,有被恶意攻击的可能。 > 当拼接 spark 命令时,使用了 {{checkCommandInjection}} 方法来避免注入攻击,但是该方法仅规避了 反引号 和 $() > 导致的注入攻击,如 {{`rm -rf /`}} {{{}$(rm -rf /){}}},无法规避其他场景,如 {{cat nohup.out2 && > echo success || echo failed}} > h2. Fix Design > 在拼接cmd命令时对参数进行检查,包括以下四种场景: > # 打诊断包时,会拼接 diag.sh 脚本的参数,如项目、jobId、路径等,依次检查每一个参数,符合 {{^[a-zA-Z0-9_./-]+$}} > 即可 > # 导出influxDB 数据时,会在命令里拼接 *数据库地址* 以及 {*}数据库名称{*}作为 influx命令的参数,前者符合 > {{[a-zA-Z0-9._-]{+}(:[0-9]{+})?}} 即可,后者符合{{{}^[0-9a-zA-Z_-]+${}}} 即可 > # 获取yarn的统计指标时,会拼接yarn 的url地址作为 curl 命令的参数,符合 > {{^(http(s)?://)?[a-zA-Z0-9._-]{+}(:[0-9]{+})?(/[a-zA-Z0-9._-]+)*/?$}} 即可 > # 执行 beeline 命令时,会将配置中的 beeline-params > 拼接到命令中,beeline-params的构成较为复杂,强制将每一个参数值使用{{{}'{}}}包起来转为字符串,如 abc → ‘abc',ab’c > → ‘ab’\''c' > > -- This message was sent by Atlassian Jira (v8.20.10#820010)