jihoonson commented on a change in pull request #10620:
URL: https://github.com/apache/druid/pull/10620#discussion_r534389085
##########
File path:
integration-tests/src/test/resources/indexer/wikipedia_http_inputsource_task.json
##########
@@ -0,0 +1,90 @@
+{
+ "type": "index_parallel",
+ "spec": {
+ "dataSchema": {
+ "dataSource": "%%DATASOURCE%%",
+ "timestampSpec": {
+ "column": "timestamp"
+ },
+ "dimensionsSpec": {
+ "dimensions": [
+ "page",
+ {"type": "string", "name": "language", "createBitmapIndex": false},
+ "user",
+ "unpatrolled",
+ "newPage",
+ "robot",
+ "anonymous",
+ "namespace",
+ "continent",
+ "country",
+ "region",
+ "city"
+ ]
+ },
+ "metricsSpec": [
+ {
+ "type": "count",
+ "name": "count"
+ },
+ {
+ "type": "doubleSum",
+ "name": "added",
+ "fieldName": "added"
+ },
+ {
+ "type": "doubleSum",
+ "name": "deleted",
+ "fieldName": "deleted"
+ },
+ {
+ "type": "doubleSum",
+ "name": "delta",
+ "fieldName": "delta"
+ },
+ {
+ "name": "thetaSketch",
+ "type": "thetaSketch",
+ "fieldName": "user"
+ },
+ {
+ "name": "quantilesDoublesSketch",
+ "type": "quantilesDoublesSketch",
+ "fieldName": "delta"
+ },
+ {
+ "name": "HLLSketchBuild",
+ "type": "HLLSketchBuild",
+ "fieldName": "user"
+ }
+ ],
+ "granularitySpec": {
+ "segmentGranularity": "DAY",
+ "queryGranularity": "second",
+ "intervals" : [ "2016-06/P1M" ]
+ }
+ },
+ "ioConfig": {
+ "type": "index_parallel",
+ "inputSource": {
+ "type": "http",
+ "uris": ["https://druid.apache.org/data/wikipedia.json.gz",
"https://druid.apache.org/data/wikipedia.json.gz"]
Review comment:
Good question. This is sort of future proof. This task spec has
`maxNumFiles` set to 1, so that it creates one subtask per file. Even though it
currently always runs in parallel mode if `maxNumConcurrentSubTasks` > 1 no
matter how many subtasks actually get created, I think, in the future, it
should take the actual number of them into consideration when determining its
running mode (parallel vs sequential). Reading 2 files makes sure that this
task will always requires 2 subtasks to run and thus run in parallel mode.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]