vogievetsky opened a new issue #7952: Transform specs are ignored if dimensions auto detection is used URL: https://github.com/apache/incubator-druid/issues/7952 The column auto detecting feature does not find the new columns created by transforms. ### Affected Version All versions of Druid so far (up to 0.15.0) ### Description Say you have data: ``` {"a":"hello","b":"world"} {"a":"where","c":"to go"} ``` In a file that lives at: `/Users/vadim/Downloads/test-data.json` And you ingest it with: ``` { "dataSchema": { "dataSource": "Downloads", "parser": { "type": "string", "parseSpec": { "format": "json", "timestampSpec": { "column": "!!!_no_such_column_!!!", "missingValue": "2010-01-01T00:00:00Z" }, "dimensionsSpec": {} } }, "metricsSpec": [ { "name": "count", "type": "count" } ], "granularitySpec": { "type": "uniform", "segmentGranularity": "DAY", "queryGranularity": "HOUR", "rollup": true, "intervals": null }, "transformSpec": { "filter": null, "transforms": [ { "type": "expression", "name": "a_prime", "expression": "concat(\"a\",'_prime')" } ] } }, "ioConfig": { "type": "index_parallel", "firehose": { "type": "local", "baseDir": "/Users/vadim/Downloads", "filter": "test-data.json" }, "appendToExisting": false }, "tuningConfig": { "type": "index_parallel", "maxRowsPerSegment": null, "maxRowsInMemory": 1000000, "maxBytesInMemory": 0, "maxTotalRows": null, "numShards": null, "indexSpec": { "bitmap": { "type": "concise" }, "dimensionCompression": "lz4", "metricCompression": "lz4", "longEncoding": "longs" }, "maxPendingPersists": 0, "forceGuaranteedRollup": false, "reportParseExceptions": false, "pushTimeout": 0, "segmentWriteOutMediumFactory": null, "maxNumSubTasks": 1, "maxRetry": 3, "taskStatusCheckPeriodMs": 1000, "chatHandlerTimeout": "PT10S", "chatHandlerNumRetries": 5, "logParseExceptions": false, "maxParseExceptions": 2147483647, "maxSavedParseExceptions": 0, "partitionDimensions": [], "buildV9Directly": true }, "type": "index_parallel" } ``` Notice how I am trying to create an `a_prime` column with a transform spec. The job will work but when you query the data:  You see that there is no `a_prime` column. I would be great (and would make a ton more sense) if the transforms added themselves to the column list coming from the file.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
