jwitko commented on code in PR #13747:
URL: https://github.com/apache/druid/pull/13747#discussion_r1102885445


##########
helm/druid/templates/broker/role.yaml:
##########
@@ -0,0 +1,21 @@
+{{- if .Values.rbac.create }}
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: {{ template "druid.broker.fullname" . }}
+  labels:
+    app: {{ include "druid.name" . }}
+    chart: {{ include "druid.chart" . }}
+    component: {{ .Values.broker.name }}
+    release: {{ .Release.Name }}
+    heritage: {{ .Release.Service }}
+rules:
+  - apiGroups:
+      - ""
+    resources:
+      - pods
+      - configmaps
+    verbs:
+      - '*'

Review Comment:
   An update:
   - Just an FYI that this is the role settings recommended in the official 
druid documentation for these extensions.
   - Attempting to convert my dev cluster over to this version of the helm 
chart has been met with some very frustrating complications.  For some reason, 
which as far as I can tell has nothing to do with service accounts, I am not 
getting the 
   `2023-02-09T21:56:09,734 INFO [k8s-task-runner-3] 
org.apache.druid.indexing.overlord.TaskQueue - Received SUCCESS status for 
task: <some_task>` response after a job when running with a cluster deployed 
via this version of the helm chart.  
   - This means a task of `{"type":"noop"}` is spawning a k8s job, executing 
successfully, the job has a zero return code and completes successfully, but 
the job status is reported as `FAILED` in the UI. 
   - The logs for that, in debug mode, look like this:
   ```
   2023-02-09T21:55:51,559 DEBUG [k8s-task-runner-25] org.apache.http.wire - 
http-outgoing-22 >> "2023-02-09T21:39:13,653 DEBUG [task-runner-0-priority-0] 
org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Performing 
action for 
task[noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92]: 
UpdateStatusAction{status=successful}[\n]"
   2023-02-09T21:55:51,560 DEBUG [k8s-task-runner-25] org.apache.http.wire - 
http-outgoing-22 >> "2023-02-09T21:39:13,663 DEBUG [task-runner-0-priority-0] 
org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Performing 
action for 
task[noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92]: 
UpdateLocationAction{taskLocation=TaskLocation{host='null', port=-1, 
tlsPort=-1}}[\n]"
   2023-02-09T21:55:51,560 DEBUG [k8s-task-runner-25] org.apache.http.wire - 
http-outgoing-22 >> "2023-02-09T21:39:13,671 DEBUG [task-runner-0-priority-0] 
org.apache.druid.indexing.overlord.TaskRunnerUtils - Task 
[noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92] status 
changed to [SUCCESS].[\n]"
   2023-02-09T21:55:51,560 DEBUG [k8s-task-runner-25] org.apache.http.wire - 
http-outgoing-22 >> "  "id" : 
"noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92",[\n]"
   2023-02-09T21:55:51,561 DEBUG [k8s-task-runner-25] org.apache.http.wire - 
http-outgoing-22 >> "2023-02-09T21:39:13,737 INFO [main] 
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting 
graceful shutdown of 
task[noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92].[\n]"
   2023-02-09T21:55:51,561 DEBUG [k8s-task-runner-25] org.apache.http.wire - 
http-outgoing-22 >> "af-4656-a1e1-9e77ced24d92] status changed to [FAILED].[\n]"
   ```
   
   The job status in the UI shows as:
   ```
   {
     "id": "noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92",
     "groupId": 
"noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92",
     "type": "noop",
     "createdTime": "2023-02-09T21:39:02.014Z",
     "queueInsertionTime": "1970-01-01T00:00:00.000Z",
     "statusCode": "FAILED",
     "status": "FAILED",
     "runnerStatusCode": "WAITING",
     "duration": -1,
     "location": {
       "host": "10.233.68.10",
       "port": 8100,
       "tlsPort": -1
     },
     "dataSource": "none",
     "errorMsg": "Task failed %s: [ 
noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92, 
noop20230209t2..."
   }
   ```
   
   The logs from the actual job itself show a `noop` task with `success`:
   ```
   2023-02-09T16:53:23,371 INFO [task-runner-0-priority-0] 
org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed 
with status: {
     "id" : 
"noop_2023-02-09T16:53:12.037Z_3c8e9f26-6ca4-4600-8723-572d2c8baf48",
     "status" : "SUCCESS",
     "duration" : 3887,
     "errorMsg" : null,
     "location" : {
       "host" : null,
       "port" : -1,
       "tlsPort" : -1
     }
   }
   ```
   I'm at a complete loss at this point so if anyone wanted to help I'd be all 
ears and welcome any type of paired-troubleshooting.  I've gone as far as to 
give all the service accounts full cluster admin permissions but again I don't 
really think this has anything to do with service accounts at this point. 
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to