suneet-s opened a new issue #9589: TransformSpec for firehoses appear to perform the operation twice URL: https://github.com/apache/druid/issues/9589 ### Affected Version Tested in 0.18 ### Description I am writing integration tests for transform specs and noticed that when using a transform spec with a parser, the transformation is being applied twice. See the below ingestion spec. You can re-create this by sym-linking `/resources` to `$DRUID_CODEBASE/integration-tests/src/test/resources` ``` { "type": "index", "spec": { "dataSchema": { "dataSource": "wiki-tests-2", "metricsSpec": [ { "type": "count", "name": "count" }, { "type": "doubleSum", "name": "added", "fieldName": "added" }, { "type": "doubleSum", "name": "triple-added", "fieldName": "triple-added" }, { "type": "doubleSum", "name": "deleted", "fieldName": "deleted" }, { "type": "doubleSum", "name": "delta", "fieldName": "delta" }, { "name": "thetaSketch", "type": "thetaSketch", "fieldName": "user" }, { "name": "quantilesDoublesSketch", "type": "quantilesDoublesSketch", "fieldName": "delta" }, { "name": "HLLSketchBuild", "type": "HLLSketchBuild", "fieldName": "user" } ], "granularitySpec": { "segmentGranularity": "DAY", "queryGranularity": "second", "intervals" : [ "2013-08-31/2013-09-02" ] }, "parser": { "parseSpec": { "format" : "json", "timestampSpec": { "column": "timestamp" }, "dimensionsSpec": { "dimensions": [ "page", "language", "user", "unpatrolled", "newPage", "robot", "anonymous", "namespace", "continent", "country", "region", "city" ] } } }, "transformSpec": { "transforms": [ { "type": "expression", "name": "language", "expression": "concat('l-', language)" }, { "type": "expression", "name": "triple-added", "expression": "added * 3" } ] } }, "ioConfig": { "type": "index", "firehose": { "type": "local", "baseDir": "/resources/data/batch_index", "filter": "wikipedia_index_data*" } }, "tuningConfig": { "type": "index", "maxRowsPerSegment": 10 } } } ``` However if you switch to the new format (inputSource/ inputFormat instead of Firehoses), it will perform the operation as expected.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
