kaplanmaxe opened a new pull request #10230:
URL: https://github.com/apache/druid/pull/10230
Related to #8560
This feature adds several bitwise expressions to be used on ingestion.
**Use Case**
Say you're doing CDC and your source table has a column of binary flags. For
OLAP queries, it's extremely useful to have these flags extracted out into
their own dimensions which can be done now on ingestion via the new expressions.
Example:
- You might have a column in your source DB called `flags` that represents a
series of binary flags. Bit 1 might mean an order was placed on mobile, bit 2
might mean it was purchased on web, bit 4 might mean it was the users first
purchase.
- Instead of ingesting the raw integer value of the `flags` column, you can
break these columns out into something like `mobile_ind`, `web_ind`,
`first_purchase_ind` where your expression can be `bitwiseAnd(flags, 1)`,
`bitwiseAnd(flags, 2)`, `bitwiseAnd(flags, 4)`.
Druid SQL currently does not support bitwise operations (#8560) which makes
these even more valuable IMO.
**Tested on a local cluster**
Ingestion spec:
```
{
"type": "index_parallel",
"spec": {
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "inline",
"data": "\"x\",\"y\"\n4,2\n8,4\n16,8\n3,2\n5,2"
},
"inputFormat": {
"type": "csv",
"findColumnsFromHeader": true
}
},
"tuningConfig": {
"type": "index_parallel",
"partitionsSpec": {
"type": "dynamic"
}
},
"dataSchema": {
"dataSource": "bitwise_test",
"granularitySpec": {
"type": "uniform",
"queryGranularity": "NONE",
"rollup": false,
"segmentGranularity": "YEAR"
},
"timestampSpec": {
"column": "!!!_no_such_column_!!!",
"missingValue": "2010-01-01T00:00:00Z"
},
"transformSpec": {
"transforms": [
{
"type": "expression",
"name": "zBitwiseAnd",
"expression": "bitwiseAnd(CAST(x, 'LONG'), CAST(y, 'LONG'))"
},
{
"type": "expression",
"name": "zBitwiseOr",
"expression": "bitwiseOr(CAST(x, 'LONG'), CAST(y, 'LONG'))"
},
{
"type": "expression",
"name": "zBitwiseComplement",
"expression": "bitwiseComplement(CAST(x, 'LONG'))"
},
{
"type": "expression",
"expression": "bitwiseShiftLeft(CAST(x, 'LONG'), CAST(y,
'LONG'))",
"name": "zBitwiseShiftLeft"
},
{
"type": "expression",
"name": "zBitwiseShiftRight",
"expression": "bitwiseShiftRight(CAST(x, 'LONG'), CAST(y,
'LONG'))"
},
{
"type": "expression",
"name": "zBitwiseXor",
"expression": "bitwiseXor(CAST(x, 'LONG'), CAST(y, 'LONG'))"
}
]
},
"dimensionsSpec": {
"dimensions": [
{
"type": "long",
"name": "x"
},
{
"type": "long",
"name": "y"
},
{
"type": "long",
"name": "zBitwiseAnd"
},
{
"type": "long",
"name": "zBitwiseComplement"
},
{
"type": "long",
"name": "zBitwiseOr"
},
{
"type": "long",
"name": "zBitwiseShiftLeft"
},
{
"type": "long",
"name": "zBitwiseShiftRight"
},
{
"type": "long",
"name": "zBitwiseXor"
}
]
}
}
}
}
```

<hr>
This PR has:
- [x] been self-reviewed.
- [x] added documentation for new or modified features or behaviors.
- [x] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [x] been tested in a test Druid cluster.
<!-- Check the items by putting "x" in the brackets for the done things. Not
all of these items apply to every PR. Remove the items which are not done or
not relevant to the PR. None of the items from the checklist above are strictly
necessary, but it would be very helpful if you at least self-review the PR. -->
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]