Hi folks Just wanted to bring this up and see what people think.
IIUC, JHS memory consumption depends on the number of jobs, tasks per job, and concurrent accesses. There might be a few orthogonal approaches to improving its scalability: - Appears we process jhist files on every access. May be, we could store the results in a different file and consult that first. We might be able to store all these events in ATS and use it for aggregation etc., but it might be a while before ATS is production-ready. - Active/active HA: We could bring up multiple instances of JHS behind a load-balancer. Moving/deleting history files needs to be done by one of them - we could have a leader that does all of this, or have ZK locks for directories being processed. Would like to hear experiences/ thoughts/ suggestions from the community. Thanks Karthik
