t0mpere opened a new issue, #17014:
URL: https://github.com/apache/pinot/issues/17014

   Based on this 
[discussion](https://apache-pinot.slack.com/archives/C011C9JHN7R/p1759937177991909)
 on Slack. 
   
   I've found a bug where `output.segment.dir.uri` is never read from the task 
config and always from the controller config. This leads to an edge case where 
if the deep store is not configured globally, it's impossible to run a 
`metadata` job. 
   
   The function `getPushTaskConfig` should prioritise taskConfig over global 
controllerConfig.
   
   I will open a PR to fix and refactor the function if we agree on the 
behaviour.
   
   Example: 
   
   #### Controller config
   
   ```
   controller.data.dir=/var/pinot/controller/data
   ```
   
   
   #### Task config
   ```
   "MergeRollupTask": {
             "1day.mergeType": "concat",
             "1day.bucketTimePeriod": "1d",
             "1day.bufferTimePeriod": "1d",
             "1day.maxNumRecordsPerSegment": "100000",
             "1day.maxNumRecordsPerTask": "500000",
             "1day.maxNumParallelBuckets": "10",
             "minionInstanceTag": "merge",
             "push.mode": "METADATA",
             "output.segment.dir.uri": "gs://my-bucket/LOADED_HOURLY/merged",
             "schedule": "0 1 * * * ?"
           }
   ```
   #### Result
   ```
   {
     "configs": {
       "push.mode": "TAR",
       ...
       "output.segment.dir.uri": "/var/pinot/controller/data/LOADED_HOURLY",
       ...
     },
     "tableName": "LOADED_HOURLY_OFFLINE",
     "taskId": 
"Task_MergeRollupTask_5cd6364b-7012-4f60-8e8b-58ee4a2196c1_1759943074336_0",
     "taskType": "MergeRollupTask"
   }
   ```
   
   #### Expected
   ```
   {
     "configs": {
       "push.mode": "METADATA",
       ...
       "output.segment.dir.uri": "gs://my-bucket/LOADED_HOURLY/merged",
       ...
     },
     "tableName": "LOADED_HOURLY_OFFLINE",
     "taskId": 
"Task_MergeRollupTask_5cd6364b-7012-4f60-8e8b-58ee4a2196c1_1759943074336_0",
     "taskType": "MergeRollupTask"
   }
   ```
   cc: @shounakmk219 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to