Stamatis Zampetakis created HIVE-29357:
------------------------------------------

             Summary: Change CBOPlan in EXPLAIN FORMATTED from plain string to 
JSON object
                 Key: HIVE-29357
                 URL: https://issues.apache.org/jira/browse/HIVE-29357
             Project: Hive
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


Currently the value of CBOPlan attribute in the result of EXPLAIN FORMATTED is 
a plain string.
{code:sql}
CREATE TABLE person (id INTEGER, country STRING);
EXPLAIN FORMATTED CBO SELECT country FROM person;
{code}
{code:json}
{"CBOPlan":"{\n  \"rels\": [\n    {\n      \"id\": \"0\",\n      \"relOp\": 
\"org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan\",\n   
   \"table\": [\n        \"default\",\n        \"person\"\n      ],\n      
\"table:alias\": \"person\",\n      \"inputs\": [],\n      \"rowCount\": 1.0,\n 
     \"avgRowSize\": 233.0,\n      \"rowType\": {\n        \"fields\": [\n      
    {\n            \"type\": \"INTEGER\",\n            \"nullable\": true,\n    
        \"name\": \"id\"\n          },\n          {\n            \"type\": 
\"VARCHAR\",\n            \"nullable\": true,\n            \"precision\": 
2147483647,\n            \"name\": \"country\"\n          },\n          {\n     
       \"type\": \"BIGINT\",\n            \"nullable\": true,\n            
\"name\": \"BLOCK__OFFSET__INSIDE__FILE\"\n          },\n          {\n          
  \"type\": \"VARCHAR\",\n            \"nullable\": true,\n            
\"precision\": 2147483647,\n            \"name\": \"INPUT__FILE__NAME\"\n       
   },\n          {\n            \"fields\": [\n              {\n                
\"type\": \"BIGINT\",\n                \"nullable\": true,\n                
\"name\": \"writeid\"\n              },\n              {\n                
\"type\": \"INTEGER\",\n                \"nullable\": true,\n                
\"name\": \"bucketid\"\n              },\n              {\n                
\"type\": \"BIGINT\",\n                \"nullable\": true,\n                
\"name\": \"rowid\"\n              }\n            ],\n            \"nullable\": 
true,\n            \"name\": \"ROW__ID\"\n          },\n          {\n           
 \"type\": \"BOOLEAN\",\n            \"nullable\": true,\n            \"name\": 
\"ROW__IS__DELETED\"\n          }\n        ],\n        \"nullable\": false\n    
  },\n      \"colStats\": [\n        {\n          \"name\": \"country\",\n      
    \"ndv\": 1\n        },\n        {\n          \"name\": \"id\",\n          
\"ndv\": 1,\n          \"minValue\": -2147483648,\n          \"maxValue\": 
2147483647\n        }\n      ]\n    },\n    {\n      \"id\": \"1\",\n      
\"relOp\": 
\"org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject\",\n     
 \"fields\": [\n        \"country\"\n      ],\n      \"exprs\": [\n        {\n  
        \"input\": 1,\n          \"name\": \"$1\"\n        }\n      ],\n      
\"rowCount\": 1.0\n    }\n  ]\n}"}
{code}
Observe that value of CBOPlan is in fact a JSON object so wrapping it in a 
string has various drawbacks:
 * Bigger size with lots of unnecessary whitespace and escaped characters
 * Poor readability since it cannot be formatted by a JSON processors
 * Deserialization overhead since consumers need to read the value of "CBOPlan" 
and retransform it to a JSON object in order to process it further

The goal is to return the value of CBOPlan as a JSON object:
{code:json}
{"CBOPlan":{"rels":[{"id":"0","relOp":"org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan","table":["default","person"],"table:alias":"person","inputs":[],"rowCount":1,"avgRowSize":233,"rowType":{"fields":[{"type":"INTEGER","nullable":true,"name":"id"},{"type":"VARCHAR","nullable":true,"precision":2147483647,"name":"country"},{"type":"BIGINT","nullable":true,"name":"BLOCK__OFFSET__INSIDE__FILE"},{"type":"VARCHAR","nullable":true,"precision":2147483647,"name":"INPUT__FILE__NAME"},{"fields":[{"type":"BIGINT","nullable":true,"name":"writeid"},{"type":"INTEGER","nullable":true,"name":"bucketid"},{"type":"BIGINT","nullable":true,"name":"rowid"}],"nullable":true,"name":"ROW__ID"},{"type":"BOOLEAN","nullable":true,"name":"ROW__IS__DELETED"}],"nullable":false},"colStats":[{"name":"country","ndv":1},{"name":"id","ndv":1,"minValue":-2147483648,"maxValue":2147483647}]},{"id":"1","relOp":"org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject","fields":["country"],"exprs":[{"input":1,"name":"$1"}],"rowCount":1}]}}
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to