clintropolis opened a new pull request, #13356:
URL: https://github.com/apache/druid/pull/13356
### Description
This PR adds `--dump nested` mode to the `dump-segment` tool, which has 2
modes for examining the internals of Druid nested columns. This mode always
requires a single `--column` argument which must be a Druid nested column, and
adds an optional `--nested-path $.some.json.path` argument that can dump a
specific field of a nested column.
(from the added docs)
If `--nested-path` is not specified, the output will contain the list of
nested fields and their types, the global
value dictionaries, and the list of null rows.
Sample output:
```json
{
"nest": {
"fields": [
{
"path": "$.x",
"types": [
"LONG"
]
},
{
"path": "$.y",
"types": [
"DOUBLE"
]
},
{
"path": "$.z",
"types": [
"STRING"
]
}
],
"dictionaries": {
"strings": [
{
"globalId": 0,
"value": null
},
{
"globalId": 1,
"value": "a"
},
{
"globalId": 2,
"value": "b"
}
],
"longs": [
{
"globalId": 3,
"value": 100
},
{
"globalId": 4,
"value": 200
},
{
"globalId": 5,
"value": 400
}
],
"doubles": [
{
"globalId": 6,
"value": 1.1
},
{
"globalId": 7,
"value": 2.2
},
{
"globalId": 8,
"value": 3.3
}
],
"nullRows": []
}
}
}
```
If `--nested-path` is specified, the output will instead contain the types
of the nested field, the local value
dictionary, including the globalId and value, the uncompressed bitmap index
for each value (list of row numbers which contain the value),
and a dump of the column itself, which contains the row number, raw JSON
form of the nested column itself, the local
dictionary id of the field for that row, and the value for the field for the
row.
Sample output:
```json
{
"bitmapSerdeFactory": {
"type": "roaring",
"compressRunOnSerialization": true
},
"nest": {
"$.x": {
"types": [
"LONG"
],
"dictionary": [
{
"localId": 0,
"globalId": 0,
"value": null,
"rows": [
4
]
},
{
"localId": 1,
"globalId": 3,
"value": "100",
"rows": [
3
]
},
{
"localId": 2,
"globalId": 4,
"value": "200",
"rows": [
0,
2
]
},
{
"localId": 3,
"globalId": 5,
"value": "400",
"rows": [
1
]
}
],
"column": [
{
"row": 0,
"raw": {
"x": 200,
"y": 2.2
},
"fieldId": 2,
"fieldValue": "200"
},
{
"row": 1,
"raw": {
"x": 400,
"y": 1.1,
"z": "a"
},
"fieldId": 3,
"fieldValue": "400"
},
{
"row": 2,
"raw": {
"x": 200,
"z": "b"
},
"fieldId": 2,
"fieldValue": "200"
},
{
"row": 3,
"raw": {
"x": 100,
"y": 1.1,
"z": "a"
},
"fieldId": 1,
"fieldValue": "100"
},
{
"row": 4,
"raw": {
"y": 3.3,
"z": "b"
},
"fieldId": 0,
"fieldValue": null
}
]
}
}
}
```
<hr>
This PR has:
- [x] been self-reviewed.
- [x] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]