KaranShishodia opened a new issue, #387:
URL: https://github.com/apache/flink-agents/issues/387

   ### Search before asking
   
   - [x] I searched in the 
[issues](https://github.com/apache/flink-agents/issues) and found nothing 
similar.
   
   ### Description
   
   The current design feeds all items into the LLM at once, which risks 
exceeding the context window. A practical solution is to introduce batched 
summarization with the following workflow:
   - Chunking Input Data
   - Split the memory items into fixed-size batches (configurable, e.g., 50–100 
items per batch).
   - Each batch is summarized independently, producing a concise representation.
   - Hierarchical Summarization
   - After batch-level summaries are generated, feed those summaries into a 
second summarization step.
   - This produces a global summary that captures the overall context without 
overwhelming the LLM.
   - Configurable Parameters
   - Allow users to configure:
   - Batch size (number of items per summarization call).
   - - Summarization depth (single-pass vs. hierarchical).
   - Retention policy (e.g., keep both batch summaries and global summary for 
traceability).
   
   
   
   - Implementation Sketch (Pseudo-Java/Python)
   List<String> items = getMemoryItems();
   int batchSize = 50;
   List<String> batchSummaries = new ArrayList<>();
   
   for (int i = 0; i < items.size(); i += batchSize) {
       List<String> batch = items.subList(i, Math.min(i + batchSize, 
items.size()));
       String summary = llm.summarize(batch);
       batchSummaries.add(summary);
   }
   
   // Hierarchical summarization
   String globalSummary = llm.summarize(batchSummaries);
   storeSummary(globalSummary);
   
   
   - Advantages
   - Prevents context overflow by respecting LLM limits.
   - Scales to very large memory sets.
   - Maintains fidelity by layering summaries instead of discarding details
   
   
   Next Steps
   - Add a runtime configuration option for batch size and summarization depth.
   - Implement a summarization operator in the Flink runtime that can be reused 
across agents.
   - Provide benchmarks comparing single-pass vs. batched summarization to 
validate efficiency
   
   
   
   
   
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to