teyeheimans opened a new issue #12393:
URL: https://github.com/apache/druid/issues/12393


   ### Description
   
   Since [my last feature request](https://github.com/apache/druid/issues/8560) 
the bitwise operators have been added. These are VERY helpful and work _almost_ 
perfect.
   
   The problem now is that when we want to use them, you always have to make 
use of a virtual column. For example:
   
   A simple query like: 
   ```sql
   SELECT * FROM "table" WHERE BITWISE_AND("flags", 4) = 0 LIMIT 10
   ```
   
   Converts into:
   
   ```json
   {
     "queryType": "scan",
     ...
     "virtualColumns": [
       {
         "type": "expression",
         "name": "v0",
         "expression": "bitwiseAnd(\"flags\",4)",
         "outputType": "LONG"
       }
     ],
     ...
     "filter": {
       "type": "selector",
       "dimension": "v0",
       "value": "0",
       "extractionFn": null
     },
    ...
   }
   ```
   
   As you can see, due to the fact that we use a virtual column, we are able to 
use the bitwise expressions. 
   
   However, druid has some other places where records can be filtered. For 
example, the 
[DruidInputSource](https://druid.apache.org/docs/latest/ingestion/native-batch.html#druid-input-source)
 and the [Filtered 
Aggregator](https://druid.apache.org/docs/latest/querying/aggregations.html#filtered-aggregator).
   
   In these places it is not possible to use a virtual column, and thus we 
cannot use the bitwise expressions. 
   
   Example. Assume a table where we store the role a user can have in a "roles" 
field. This field is a digit, where each bit represents a role.
   
   Thus:
   1 = Content moderator
   2 = Financial user
   4 = Data analyst
   8 = Sales 
   ... etc 
   
   A person can have multiple roles of course. So somebody with a "roles" value 
of 5 is both Content moderator as data analyst.
   
   Now, back to druid.
   
   When I want to create a new druid dataSource and fill it with a native batch 
ingestion task with data from another druid dataSource, I cannot filter on the 
"roles" as I want to. See the "filter" part below: 
    
   ```json
   {
     "type": "index_parallel",
     "spec": {
       "dataSchema": {
         "dataSource": "dataAnalystUsers",
         "timestampSpec": { ... },
         "dimensionsSpec": { ... },
         "metricsSpec": [ ... ],
         "granularitySpec": { ... }
       },
       "ioConfig": {
         "type": "index_parallel",
         "inputSource": {
           "type": "druid",
           "dataSource": "users",
           "interval": "2022-02-12/P1D",
           "filter": {
              'type': '????',
              'dimension': 'roles',
              'value': 4
           }
         }
       },
       "tuningConfig": { ... }
     }
   }
   ```  
   
   As you can see, we have no way to filter here using the bitwise expressions. 
My suggestion is to create native filter types for the bitwise expressions:
   
   Maybe something like below, but this is only a idea.
   ```json
   {
    'filter': {
       'type': 'bitwiseAnd',
       'dimension': '<DIMENSION>'
       'value': <VALUE>,
       'expected': <VALUE> 
     }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to