[PR] fix(providers/common-ai): LlamaIndexEmbeddingOperator always returns … [airflow]

via GitHub Fri, 12 Jun 2026 00:29:19 -0700


bramhanandlingala opened a new pull request, #68434:
URL: https://github.com/apache/airflow/pull/68434


   Summary:
   LlamaIndexEmbeddingOperator.execute() always returned "vector": None for 
every chunk in the output. The root cause is that 
VectorStoreIndex._get_node_with_embedding() calls node.model_copy() and assigns 
the embedding on the copy — the original node objects are never mutated. This 
has been the behavior across all tested llama-index-core versions (v0.10–v0.14).
   
   Fix:
   Pre-embed all nodes by calling embed_model.get_text_embedding_batch() and 
assigning the resulting vectors onto the original node objects before passing 
them to VectorStoreIndex. Since embed_nodes() inside VectorStoreIndex skips 
nodes whose .embedding is already populated, this produces no duplicate 
embedding API calls.
   
   Testing:
   Manually reproduced the bug with a mock stub mirroring llama-index's 
model_copy() behavior
   Verified the fix produces correct non-None vectors for all chunks
   No changes to public API or operator interface
   
   Related: Fixes #68416


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] fix(providers/common-ai): LlamaIndexEmbeddingOperator always returns … [airflow]

Reply via email to