[
https://issues.apache.org/jira/browse/HIVE-29362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HIVE-29362:
----------------------------------
Labels: pull-request-available (was: )
> New configuration to display EXPLAIN FORMATTED in human-readable format
> -----------------------------------------------------------------------
>
> Key: HIVE-29362
> URL: https://issues.apache.org/jira/browse/HIVE-29362
> Project: Hive
> Issue Type: Improvement
> Components: HiveServer2
> Affects Versions: 4.2.0
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
> Labels: pull-request-available
>
> The EXPLAIN FORMATTED (introduced by HIVE-2546) returns a JSON output that is
> primarily meant to be consumed programmaticaly by other applications thus it
> is returned as a single line string with no indentation/formatting.
> However, in some cases it is convenient to display the result in a
> human-readable format. For example, in the context of qtests its much easier
> for a contributor/reviewer to understand the output when it is properly
> formatted than when everything is packed into a single line.
> In addition, having a properly formatted JSON file in git has multiple
> advantages:
> * less merge conflicts (since we don't always touch the same line)
> * cleaner history/blame (since we see exactly the part of the plan that is
> affected)
> * cleaner diff
> Last but not least, a formatted JSON file significantly improves the runtime
> of tests utilizing the
> [QOutProcessor|https://github.com/apache/hive/blob/4c94300ccc9f4419967aeacfba22975440810cc3/itests/util/src/main/java/org/apache/hadoop/hive/ql/QOutProcessor.java].
> The QOutProcessor relies heavily on pattern matching and its very slow when
> input lines are large. Currently, running TestTezTPCDS30TBPerfCliDriver with
> query4.q (modified to EXPLAIN FORMATTED) takes ~100 seconds while if the JSON
> is formatted/multi-line the time goes down to ~20 seconds.
> The goal is to add a new configuration property
> ({{{}hive.explain.formatted.indent{}}}) for controlling the display format
> (machine vs. human readable) of {{EXPLAIN FORMATTED}} statement.
> {code:sql}
> CREATE TABLE person (id INT, fname STRING);
> EXPLAIN FORMATTED SELECT fname FROM person WHERE id > 100
> {code}
> When its false (default value) we keep the current behavior and we return the
> JSON without any indentation.
> {code:json}
> {"CBOPlan":{"rels":[{"id":"0","relOp":"org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan","table":["default","person"],"table:alias":"person","inputs":[],"rowCount":1,"avgRowSize":104,"rowType":{"fields":[{"type":"INTEGER","nullable":true,"name":"id"},{"type":"VARCHAR","nullable":true,"precision":2147483647,"name":"fname"},{"type":"BIGINT","nullable":true,"name":"BLOCK__OFFSET__INSIDE__FILE"},{"type":"VARCHAR","nullable":true,"precision":2147483647,"name":"INPUT__FILE__NAME"},{"fields":[{"type":"BIGINT","nullable":true,"name":"writeid"},{"type":"INTEGER","nullable":true,"name":"bucketid"},{"type":"BIGINT","nullable":true,"name":"rowid"}],"nullable":true,"name":"ROW__ID"},{"type":"BOOLEAN","nullable":true,"name":"ROW__IS__DELETED"}],"nullable":false},"colStats":[{"name":"id","ndv":1,"minValue":-2147483648,"maxValue":2147483647},{"name":"fname","ndv":1}]},{"id":"1","relOp":"org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter","condition":{"op":{"name":">","kind":"GREATER_THAN","syntax":"BINARY"},"operands":[{"input":0,"name":"$0"},{"literal":100,"type":{"type":"INTEGER","nullable":false}}]},"rowCount":1},{"id":"2","relOp":"org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject","fields":["fname"],"exprs":[{"input":1,"name":"$1"}],"rowCount":1}]},"optimizedSQL":"SELECT
> `fname`\nFROM `default`.`person`\nWHERE `id` > 100","cboInfo":"Plan
> optimized by CBO.","STAGE DEPENDENCIES":{"Stage-0":{"ROOT
> STAGE":"TRUE"}},"STAGE PLANS":{"Stage-0":{"Fetch
> Operator":{"limit:":"-1","Processor
> Tree:":{"TableScan":{"alias:":"person","columns:":["id","fname"],"database:":"default","filterExpr:":"(id
> > 100) (type:
> boolean)","table:":"person","isTempTable:":"false","OperatorId:":"TS_0","children":{"Filter
> Operator":{"predicate:":"(id > 100) (type:
> boolean)","OperatorId:":"FIL_4","children":{"Select
> Operator":{"expressions:":"fname (type:
> string)","columnExprMap:":{"_col0":"fname"},"outputColumnNames:":["_col0"],"OperatorId:":"SEL_2","children":{"ListSink":{"OperatorId:":"LIST_SINK_5"}}}}}}}}}}}}
> {code}
> When its true (true for all qtests) we return a human-readable JSON output,
> split in multiple lines with proper indentation.
> {code:json}
> {
> "CBOPlan": {
> "rels": [
> {
> "id": "0",
> "relOp":
> "org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan",
> "table": [
> "default",
> "person"
> ],
> "table:alias": "person",
> "inputs": [],
> "rowCount": 1,
> "avgRowSize": 104,
> "rowType": {
> "fields": [
> {
> "type": "INTEGER",
> "nullable": true,
> "name": "id"
> },
> {
> "type": "VARCHAR",
> "nullable": true,
> "precision": 2147483647,
> "name": "fname"
> },
> {
> "type": "BIGINT",
> "nullable": true,
> "name": "BLOCK__OFFSET__INSIDE__FILE"
> },
> {
> "type": "VARCHAR",
> "nullable": true,
> "precision": 2147483647,
> "name": "INPUT__FILE__NAME"
> },
> {
> "fields": [
> {
> "type": "BIGINT",
> "nullable": true,
> "name": "writeid"
> },
> {
> "type": "INTEGER",
> "nullable": true,
> "name": "bucketid"
> },
> {
> "type": "BIGINT",
> "nullable": true,
> "name": "rowid"
> }
> ],
> "nullable": true,
> "name": "ROW__ID"
> },
> {
> "type": "BOOLEAN",
> "nullable": true,
> "name": "ROW__IS__DELETED"
> }
> ],
> "nullable": false
> },
> "colStats": [
> {
> "name": "id",
> "ndv": 1,
> "minValue": -2147483648,
> "maxValue": 2147483647
> },
> {
> "name": "fname",
> "ndv": 1
> }
> ]
> },
> {
> "id": "1",
> "relOp":
> "org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter",
> "condition": {
> "op": {
> "name": ">",
> "kind": "GREATER_THAN",
> "syntax": "BINARY"
> },
> "operands": [
> {
> "input": 0,
> "name": "$0"
> },
> {
> "literal": 100,
> "type": {
> "type": "INTEGER",
> "nullable": false
> }
> }
> ]
> },
> "rowCount": 1
> },
> {
> "id": "2",
> "relOp":
> "org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject",
> "fields": [
> "fname"
> ],
> "exprs": [
> {
> "input": 1,
> "name": "$1"
> }
> ],
> "rowCount": 1
> }
> ]
> },
> "optimizedSQL": "SELECT `fname`\nFROM `default`.`person`\nWHERE `id` > 100",
> "cboInfo": "Plan optimized by CBO.",
> "STAGE DEPENDENCIES": {
> "Stage-0": {
> "ROOT STAGE": "TRUE"
> }
> },
> "STAGE PLANS": {
> "Stage-0": {
> "Fetch Operator": {
> "limit:": "-1",
> "Processor Tree:": {
> "TableScan": {
> "alias:": "person",
> "columns:": [
> "id",
> "fname"
> ],
> "database:": "default",
> "filterExpr:": "(id > 100) (type: boolean)",
> "table:": "person",
> "isTempTable:": "false",
> "OperatorId:": "TS_0",
> "children": {
> "Filter Operator": {
> "predicate:": "(id > 100) (type: boolean)",
> "OperatorId:": "FIL_4",
> "children": {
> "Select Operator": {
> "expressions:": "fname (type: string)",
> "columnExprMap:": {
> "_col0": "fname"
> },
> "outputColumnNames:": [
> "_col0"
> ],
> "OperatorId:": "SEL_2",
> "children": {
> "ListSink": {
> "OperatorId:": "LIST_SINK_5"
> }
> }
> }
> }
> }
> }
> }
> }
> }
> }
> }
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)