pierrejeambrun commented on code in PR #58814:
URL: https://github.com/apache/airflow/pull/58814#discussion_r2580550501


##########
airflow-core/src/airflow/config_templates/config.yml:
##########
@@ -712,6 +712,20 @@ database:
       type: integer
       example: ~
       default: "10000"
+    metadata_indexes:
+      description: |
+        JSON list of additional indexes to create on the metadata database at 
API server startup.
+
+        Each item must be a string specifying the table and one or more 
columns:
+        - "table(column1, column2, ...)"
+
+        Existing indexes are detected and skipped. On PostgreSQL, indexes are 
created
+        CONCURRENTLY to avoid locking tables. Other databases attempt 
non-blocking creation
+        where supported, otherwise fallback to standard index creation.
+      version_added: 3.2.0
+      type: string
+      example: "task_instance(dag_id, task_id, 
run_id)|log(dttm)|dag_run(dag_id, run_id)"

Review Comment:
   > I'd suggest we instead build a way to provision these indexes we've 
identified at component startup.
   
   To give some context the idea was that we do not know how people will use 
the API (specific filtering/ordering) UI (specific list views), and we cannot 
create all the possible indexes / columns combination indexes. Also it depends 
on their tables size some people will have a really huge DagRun table and need 
some indexes there, some people will have that problem on the dag table etc... 
So the idea was to let them decide what they need depending on their usage and 
provide a way to help them create the index. Can you give more details to what 
you have in mind here?
   
   > if someone knows what indexes to add, they can just add them... This feels 
like a footgun.
   
   That is true, I thought about that too at some point.
   
   Maybe just expending the doc with common use cases of why people would want 
to add specific indexes to the db and how to identify those bottlenecks is 
enough as a starter? Since this is for power users and very specific use cases 
of airflow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to