bramhanandlingala opened a new pull request, #68434: URL: https://github.com/apache/airflow/pull/68434
Summary: LlamaIndexEmbeddingOperator.execute() always returned "vector": None for every chunk in the output. The root cause is that VectorStoreIndex._get_node_with_embedding() calls node.model_copy() and assigns the embedding on the copy — the original node objects are never mutated. This has been the behavior across all tested llama-index-core versions (v0.10–v0.14). Fix: Pre-embed all nodes by calling embed_model.get_text_embedding_batch() and assigning the resulting vectors onto the original node objects before passing them to VectorStoreIndex. Since embed_nodes() inside VectorStoreIndex skips nodes whose .embedding is already populated, this produces no duplicate embedding API calls. Testing: Manually reproduced the bug with a mock stub mirroring llama-index's model_copy() behavior Verified the fix produces correct non-None vectors for all chunks No changes to public API or operator interface Related: Fixes #68416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
