[ 
https://issues.apache.org/jira/browse/HIVE-29362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-29362:
----------------------------------
    Labels: pull-request-available  (was: )

> New configuration to display EXPLAIN FORMATTED in human-readable format
> -----------------------------------------------------------------------
>
>                 Key: HIVE-29362
>                 URL: https://issues.apache.org/jira/browse/HIVE-29362
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>    Affects Versions: 4.2.0
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>
> The EXPLAIN FORMATTED (introduced by HIVE-2546) returns a JSON output that is 
> primarily meant to be consumed programmaticaly by other applications thus it 
> is returned as a single line string with no indentation/formatting.
> However, in some cases it is convenient to display the result in a 
> human-readable format. For example, in the context of qtests its much easier 
> for a contributor/reviewer to understand the output when it is properly 
> formatted than when everything is packed into a single line.
> In addition, having a properly formatted JSON file in git has multiple 
> advantages:
>  * less merge conflicts (since we don't always touch the same line)
>  * cleaner history/blame (since we see exactly the part of the plan that is 
> affected)
>  * cleaner diff
> Last but not least, a formatted JSON file significantly improves the runtime 
> of tests utilizing the 
> [QOutProcessor|https://github.com/apache/hive/blob/4c94300ccc9f4419967aeacfba22975440810cc3/itests/util/src/main/java/org/apache/hadoop/hive/ql/QOutProcessor.java].
>  The QOutProcessor relies heavily on pattern matching and its very slow when 
> input lines are large. Currently, running TestTezTPCDS30TBPerfCliDriver with 
> query4.q (modified to EXPLAIN FORMATTED) takes ~100 seconds while if the JSON 
> is formatted/multi-line the time goes down to ~20 seconds.
> The goal is to add a new configuration property 
> ({{{}hive.explain.formatted.indent{}}}) for controlling the display format 
> (machine vs. human readable) of {{EXPLAIN FORMATTED}} statement.
> {code:sql}
> CREATE TABLE person (id INT, fname STRING);
> EXPLAIN FORMATTED SELECT fname FROM person WHERE id > 100
> {code}
> When its false (default value) we keep the current behavior and we return the 
> JSON without any indentation.
> {code:json}
> {"CBOPlan":{"rels":[{"id":"0","relOp":"org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan","table":["default","person"],"table:alias":"person","inputs":[],"rowCount":1,"avgRowSize":104,"rowType":{"fields":[{"type":"INTEGER","nullable":true,"name":"id"},{"type":"VARCHAR","nullable":true,"precision":2147483647,"name":"fname"},{"type":"BIGINT","nullable":true,"name":"BLOCK__OFFSET__INSIDE__FILE"},{"type":"VARCHAR","nullable":true,"precision":2147483647,"name":"INPUT__FILE__NAME"},{"fields":[{"type":"BIGINT","nullable":true,"name":"writeid"},{"type":"INTEGER","nullable":true,"name":"bucketid"},{"type":"BIGINT","nullable":true,"name":"rowid"}],"nullable":true,"name":"ROW__ID"},{"type":"BOOLEAN","nullable":true,"name":"ROW__IS__DELETED"}],"nullable":false},"colStats":[{"name":"id","ndv":1,"minValue":-2147483648,"maxValue":2147483647},{"name":"fname","ndv":1}]},{"id":"1","relOp":"org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter","condition":{"op":{"name":">","kind":"GREATER_THAN","syntax":"BINARY"},"operands":[{"input":0,"name":"$0"},{"literal":100,"type":{"type":"INTEGER","nullable":false}}]},"rowCount":1},{"id":"2","relOp":"org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject","fields":["fname"],"exprs":[{"input":1,"name":"$1"}],"rowCount":1}]},"optimizedSQL":"SELECT
>  `fname`\nFROM `default`.`person`\nWHERE `id` > 100","cboInfo":"Plan 
> optimized by CBO.","STAGE DEPENDENCIES":{"Stage-0":{"ROOT 
> STAGE":"TRUE"}},"STAGE PLANS":{"Stage-0":{"Fetch 
> Operator":{"limit:":"-1","Processor 
> Tree:":{"TableScan":{"alias:":"person","columns:":["id","fname"],"database:":"default","filterExpr:":"(id
>  > 100) (type: 
> boolean)","table:":"person","isTempTable:":"false","OperatorId:":"TS_0","children":{"Filter
>  Operator":{"predicate:":"(id > 100) (type: 
> boolean)","OperatorId:":"FIL_4","children":{"Select 
> Operator":{"expressions:":"fname (type: 
> string)","columnExprMap:":{"_col0":"fname"},"outputColumnNames:":["_col0"],"OperatorId:":"SEL_2","children":{"ListSink":{"OperatorId:":"LIST_SINK_5"}}}}}}}}}}}}
> {code}
> When its true (true for all qtests) we return a human-readable JSON output, 
> split in multiple lines with proper indentation.
> {code:json}
> {
>   "CBOPlan": {
>     "rels": [
>       {
>         "id": "0",
>         "relOp": 
> "org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan",
>         "table": [
>           "default",
>           "person"
>         ],
>         "table:alias": "person",
>         "inputs": [],
>         "rowCount": 1,
>         "avgRowSize": 104,
>         "rowType": {
>           "fields": [
>             {
>               "type": "INTEGER",
>               "nullable": true,
>               "name": "id"
>             },
>             {
>               "type": "VARCHAR",
>               "nullable": true,
>               "precision": 2147483647,
>               "name": "fname"
>             },
>             {
>               "type": "BIGINT",
>               "nullable": true,
>               "name": "BLOCK__OFFSET__INSIDE__FILE"
>             },
>             {
>               "type": "VARCHAR",
>               "nullable": true,
>               "precision": 2147483647,
>               "name": "INPUT__FILE__NAME"
>             },
>             {
>               "fields": [
>                 {
>                   "type": "BIGINT",
>                   "nullable": true,
>                   "name": "writeid"
>                 },
>                 {
>                   "type": "INTEGER",
>                   "nullable": true,
>                   "name": "bucketid"
>                 },
>                 {
>                   "type": "BIGINT",
>                   "nullable": true,
>                   "name": "rowid"
>                 }
>               ],
>               "nullable": true,
>               "name": "ROW__ID"
>             },
>             {
>               "type": "BOOLEAN",
>               "nullable": true,
>               "name": "ROW__IS__DELETED"
>             }
>           ],
>           "nullable": false
>         },
>         "colStats": [
>           {
>             "name": "id",
>             "ndv": 1,
>             "minValue": -2147483648,
>             "maxValue": 2147483647
>           },
>           {
>             "name": "fname",
>             "ndv": 1
>           }
>         ]
>       },
>       {
>         "id": "1",
>         "relOp": 
> "org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter",
>         "condition": {
>           "op": {
>             "name": ">",
>             "kind": "GREATER_THAN",
>             "syntax": "BINARY"
>           },
>           "operands": [
>             {
>               "input": 0,
>               "name": "$0"
>             },
>             {
>               "literal": 100,
>               "type": {
>                 "type": "INTEGER",
>                 "nullable": false
>               }
>             }
>           ]
>         },
>         "rowCount": 1
>       },
>       {
>         "id": "2",
>         "relOp": 
> "org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject",
>         "fields": [
>           "fname"
>         ],
>         "exprs": [
>           {
>             "input": 1,
>             "name": "$1"
>           }
>         ],
>         "rowCount": 1
>       }
>     ]
>   },
>   "optimizedSQL": "SELECT `fname`\nFROM `default`.`person`\nWHERE `id` > 100",
>   "cboInfo": "Plan optimized by CBO.",
>   "STAGE DEPENDENCIES": {
>     "Stage-0": {
>       "ROOT STAGE": "TRUE"
>     }
>   },
>   "STAGE PLANS": {
>     "Stage-0": {
>       "Fetch Operator": {
>         "limit:": "-1",
>         "Processor Tree:": {
>           "TableScan": {
>             "alias:": "person",
>             "columns:": [
>               "id",
>               "fname"
>             ],
>             "database:": "default",
>             "filterExpr:": "(id > 100) (type: boolean)",
>             "table:": "person",
>             "isTempTable:": "false",
>             "OperatorId:": "TS_0",
>             "children": {
>               "Filter Operator": {
>                 "predicate:": "(id > 100) (type: boolean)",
>                 "OperatorId:": "FIL_4",
>                 "children": {
>                   "Select Operator": {
>                     "expressions:": "fname (type: string)",
>                     "columnExprMap:": {
>                       "_col0": "fname"
>                     },
>                     "outputColumnNames:": [
>                       "_col0"
>                     ],
>                     "OperatorId:": "SEL_2",
>                     "children": {
>                       "ListSink": {
>                         "OperatorId:": "LIST_SINK_5"
>                       }
>                     }
>                   }
>                 }
>               }
>             }
>           }
>         }
>       }
>     }
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to