Hi folks

Just wanted to bring this up and see what people think.

IIUC, JHS memory consumption depends on the number of jobs, tasks per job,
and concurrent accesses. There might be a few orthogonal approaches to
improving its scalability:

   - Appears we process jhist files on every access. May be, we could store
   the results in a different file and consult that first. We might be able to
   store all these events in ATS and use it for aggregation etc., but it might
   be a while before ATS is production-ready.
   - Active/active HA: We could bring up multiple instances of JHS behind a
   load-balancer. Moving/deleting history files needs to be done by one of
   them - we could have a leader that does all of this, or have ZK locks for
   directories being processed.

Would like to hear experiences/ thoughts/ suggestions from the community.

Thanks
Karthik

Reply via email to