Ironlink commented on issue #22160: URL: https://github.com/apache/beam/issues/22160#issuecomment-1182979277
@egalpin I currently have: * 6.5 TBs of source documents * 8.5 TBs of primary shard data (measured as index disk usage when index has no replicas) * 6 data nodes with 29 GBs of machine memory each (3 GBs lost to overhead) * 2 coordinator nodes with 13 GBs of machine memory each (3 GBs lost to overhead) In my pipeline, I have a maximum bulk load size of 45 MBs, and a maximum number of workers of 20 which results in 40 threads on Dataflow. 40 threads times 45 MBs gives 1 800 MBs of concurrent indexing data. Bulk requests are load balanced between the two coordinator nodes, which split and route the requests to respective data nodes based on document routing. By default, the ES coordinator nodes had allocated 50 % of memory to application heap, and the other 50 % to OS file system caching. As noted by the linked blog entry, the default limit for concurrent indexing data is 10 % of the application heap. In absolute numbers, this means the default limit for concurrent indexing data was about 600 MBs per coordinator node. I came across the linked blog entry as part of my troubleshooting, and adjusted my coordinator nodes as a result. Specifically, I raised the heap size from 6 GBs to 10 GBs, and raised `indexing_pressure.memory.limit` from 10 % to 20 % since the coordinators do nothing but split and forward requests. With this, the limit per coordinator becomes 2 GBs. I have not yet had the time to run my pipeline with this configuration, but it seems likely that this will work without any rejected requests since each coordinator on its own is able to fit the full 1 800 MBs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
