vogievetsky opened a new issue #7952: Transform specs are ignored if dimensions 
auto detection is used
URL: https://github.com/apache/incubator-druid/issues/7952
 
 
   The column auto detecting feature does not find the new columns created by 
transforms.
   
   ### Affected Version
   
   All versions of Druid so far (up to 0.15.0)
   
   ### Description
   
   Say you have data:
   ```
   {"a":"hello","b":"world"}
   {"a":"where","c":"to go"}
   ```
   
   In a file that lives at: `/Users/vadim/Downloads/test-data.json`
   
   And you ingest it with:
   ```
   {
     "dataSchema": {
       "dataSource": "Downloads",
       "parser": {
         "type": "string",
         "parseSpec": {
           "format": "json",
           "timestampSpec": {
             "column": "!!!_no_such_column_!!!",
             "missingValue": "2010-01-01T00:00:00Z"
           },
           "dimensionsSpec": {}
         }
       },
       "metricsSpec": [
         {
           "name": "count",
           "type": "count"
         }
       ],
       "granularitySpec": {
         "type": "uniform",
         "segmentGranularity": "DAY",
         "queryGranularity": "HOUR",
         "rollup": true,
         "intervals": null
       },
       "transformSpec": {
         "filter": null,
         "transforms": [
           {
             "type": "expression",
             "name": "a_prime",
             "expression": "concat(\"a\",'_prime')"
           }
         ]
       }
     },
     "ioConfig": {
       "type": "index_parallel",
       "firehose": {
         "type": "local",
         "baseDir": "/Users/vadim/Downloads",
         "filter": "test-data.json"
       },
       "appendToExisting": false
     },
     "tuningConfig": {
       "type": "index_parallel",
       "maxRowsPerSegment": null,
       "maxRowsInMemory": 1000000,
       "maxBytesInMemory": 0,
       "maxTotalRows": null,
       "numShards": null,
       "indexSpec": {
         "bitmap": {
           "type": "concise"
         },
         "dimensionCompression": "lz4",
         "metricCompression": "lz4",
         "longEncoding": "longs"
       },
       "maxPendingPersists": 0,
       "forceGuaranteedRollup": false,
       "reportParseExceptions": false,
       "pushTimeout": 0,
       "segmentWriteOutMediumFactory": null,
       "maxNumSubTasks": 1,
       "maxRetry": 3,
       "taskStatusCheckPeriodMs": 1000,
       "chatHandlerTimeout": "PT10S",
       "chatHandlerNumRetries": 5,
       "logParseExceptions": false,
       "maxParseExceptions": 2147483647,
       "maxSavedParseExceptions": 0,
       "partitionDimensions": [],
       "buildV9Directly": true
     },
     "type": "index_parallel"
   }
   ```
   
   Notice how I am trying to create an `a_prime` column with a transform spec.
   
   The job will work but when you query the data:
   
   
![image](https://user-images.githubusercontent.com/177816/60033822-28f8d880-965e-11e9-9cac-557fcc714b87.png)
   
   You see that there is no `a_prime` column.
   
   I would be great (and would make a ton more sense) if the transforms added 
themselves to the column list coming from the file.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to