jacobtolar opened a new pull request #9192: Add ReplaceMissingValueExtractionFn
URL: https://github.com/apache/druid/pull/9192
 
 
   ### Description
   
   This adds a new built-in extraction function to coalesce missing values to a 
provided value. I'd be happy to find out that there's an existing good way to 
do this; I ended up with a query like this: 
   ```
   ...
     "dimensions": [
       {
         "type": "extraction",
         "dimension": "field_name",
         "outputName": "output_field_name",
         "extractionFn": {
           "type": "regex",
           "expr": "(.+)",
           "replaceMissingValue": true,
           "replaceMissingValueWith": "0"
         }
       },
   ...
   ```
   
   This is obviously a roundabout way of achieving what I want to accomplish, 
but I couldn't find any other way to do this using the existing built-in 
extraction functions. (It also can't quite be done with the map-based `lookup` 
extractor). I'd be happy to learn if there's a better approach (?). 
   
   This PR provides an extraction function that allows you to directly execute 
this 'coalesce null values' operation. You'd configure it like this:
   
   ```
   ...
     "dimensions": [
       {
         "type": "extraction",
         "dimension": "field_name",
         "outputName": "output_field_name",
         "extractionFn": {
           "type": "replaceMissing",
           "replaceMissingValueWith": "0"
         }
       },
   ...
   ```
   
   It does this by extending `FunctionalExtraction` (passing in the identity 
function) and configuring the null replacement options in that class. Nothing 
too complicated.
   
   If this seems useful, I'm happy to update documentation/tests/etc as needed 
in this PR, but didn't want to go through that effort if already a better way 
to do this. 
   
   Thanks!
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed, *although I'm only passingly familiar with the 
Druid codebase*.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `ReplaceMissingValueExtractionFn`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to