One of the lessons that have bubbled up in doing some performance analysis
is that having the indexing topology share both the ES and the HDFS writer
in the same topology can be problematic from a tuning perspective.
Specifically, it's hard to square that circle and make both perform fast
enough to not cause significant back-pressure in kafka (and often Commit
Exceptions in the kafka spout).

I wanted to get the community's opinion about the possibility of separating
the two current writers into separate topologies which could be tuned
separately.

Pros:

   - Practically speaking, tuning separately is often a lot easier than
   trying to tune together
   - This opens us up with the beginnings of an abstraction that may be
   reusable to expose new indexers to Metron

Cons:

   - It has the potential to mask a problem.  We may want to ensure that
   the writers write at the same rate and don't get far ahead of one another.
   In the current setup, this is inherent in the design.  If we separate them,
   they may be reading at different rates and one index may get ahead of the
   other.
   - The management pack section around indexing would need to be
   reconsidered if we split them up

Personally, I'm strongly in favor of splitting them up, but I want to make
sure that we don't miss an important nuance here.  The first con is
concerning to me, but I'd argue that another lesson from performance tuning
is that we need to monitor the average partition lag over time in the
management UI for the various consumer groups and ensure that writing keeps
up with reading.  If we insist on this assertion being true for all healthy
metron installations, the primary con goes away in my mind.

Anyway, I'm sure I've missed some pros and cons, so it'd be great to hear
community feedback here.  Thoughts?

Reply via email to