clintropolis opened a new pull request, #13356:
URL: https://github.com/apache/druid/pull/13356

   ### Description
   This PR adds `--dump nested` mode to the `dump-segment` tool, which has 2 
modes for examining the internals of Druid nested columns.  This mode always 
requires a single `--column` argument which must be a Druid nested column, and 
adds an optional `--nested-path $.some.json.path` argument that can dump a 
specific field of a nested column.
   
   (from the added docs)
   
   If `--nested-path` is not specified, the output will contain the list of 
nested fields and their types, the global
   value dictionaries, and the list of null rows.
   
   Sample output:
   ```json
   {
     "nest": {
       "fields": [
         {
           "path": "$.x",
           "types": [
             "LONG"
           ]
         },
         {
           "path": "$.y",
           "types": [
             "DOUBLE"
           ]
         },
         {
           "path": "$.z",
           "types": [
             "STRING"
           ]
         }
       ],
       "dictionaries": {
         "strings": [
           {
             "globalId": 0,
             "value": null
           },
           {
             "globalId": 1,
             "value": "a"
           },
           {
             "globalId": 2,
             "value": "b"
           }
         ],
         "longs": [
           {
             "globalId": 3,
             "value": 100
           },
           {
             "globalId": 4,
             "value": 200
           },
           {
             "globalId": 5,
             "value": 400
           }
         ],
         "doubles": [
           {
             "globalId": 6,
             "value": 1.1
           },
           {
             "globalId": 7,
             "value": 2.2
           },
           {
             "globalId": 8,
             "value": 3.3
           }
         ],
         "nullRows": []
       }
     }
   }
   ```
   
   If `--nested-path` is specified, the output will instead contain the types 
of the nested field, the local value
   dictionary, including the globalId and value, the uncompressed bitmap index 
for each value (list of row numbers which contain the value),
   and a dump of the column itself, which contains the row number, raw JSON 
form of the nested column itself, the local 
   dictionary id of the field for that row, and the value for the field for the 
row.
   
   Sample output:
   ```json
   {
     "bitmapSerdeFactory": {
       "type": "roaring",
       "compressRunOnSerialization": true
     },
     "nest": {
       "$.x": {
         "types": [
           "LONG"
         ],
         "dictionary": [
           {
             "localId": 0,
             "globalId": 0,
             "value": null,
             "rows": [
               4
             ]
           },
           {
             "localId": 1,
             "globalId": 3,
             "value": "100",
             "rows": [
               3
             ]
           },
           {
             "localId": 2,
             "globalId": 4,
             "value": "200",
             "rows": [
               0,
               2
             ]
           },
           {
             "localId": 3,
             "globalId": 5,
             "value": "400",
             "rows": [
               1
             ]
           }
         ],
         "column": [
           {
             "row": 0,
             "raw": {
               "x": 200,
               "y": 2.2
             },
             "fieldId": 2,
             "fieldValue": "200"
           },
           {
             "row": 1,
             "raw": {
               "x": 400,
               "y": 1.1,
               "z": "a"
             },
             "fieldId": 3,
             "fieldValue": "400"
           },
           {
             "row": 2,
             "raw": {
               "x": 200,
               "z": "b"
             },
             "fieldId": 2,
             "fieldValue": "200"
           },
           {
             "row": 3,
             "raw": {
               "x": 100,
               "y": 1.1,
               "z": "a"
             },
             "fieldId": 1,
             "fieldValue": "100"
           },
           {
             "row": 4,
             "raw": {
               "y": 3.3,
               "z": "b"
             },
             "fieldId": 0,
             "fieldValue": null
           }
         ]
       }
     }
   }
   ```
   
   <hr>
   This PR has:
   
   - [x] been self-reviewed.
   - [x] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to