Re: Understanding metrics

xinyu liu Tue, 14 Mar 2017 17:08:38 -0700

Hi, Ankit,

When running your job in multithreading, block-ns here actually includes
the process_ns. This is because after your task.process() is submitted to
the thread pool, the run loop thread will be blocked until the process() is
complete for one of the task. It's interesting that block-ns (0.3 ms) is
much longer than process-ns (0.12 ms). I am wondering whether you also have
window and checkpoint configured for your job. Since window and checkpoint
will also be running inside this thread pool to improve the parallelism,
block-ns will be affected since the run loop will block for
window/checkpoint to complete. If you are using window/commit, please send
us the configs (task.window.ms and task.commit.ms) and the timer metrics
(window-ns and commit-ns). Then we can correlate better with block-ns.


Thanks,
Xinyu

On Tue, Mar 14, 2017 at 3:33 PM, Ankit Malhotra <[email protected]>
wrote:

> Wait, block-ns = 0.3ms (300,000ns). Also, why are we not adding in
> choose-ns?
>
> Thanks
> Ankit
>
> On 3/14/17, 6:26 PM, "Jagadish Venkatraman" <[email protected]>
> wrote:
>
>     I would expect (process_ns + block_ns) to be almost the same as 0.15
> which
>     makes sense.
>
>     process_ns = 0.12 ms
>     block_ns = 0.03 ms
>     process_ns + block_ns ~ 0.15ms
>
>     This corresponds to the number of process calls roughly 1/7000 ~
> 0.15ms per
>     process call.
>
>     >> Each process call is a separate thread.
>     Given that you have one partition in each container, and you want
> in-order
>     processing, there will be only one thread that's processing messages.
> The
>     two other threads are not doing work, and entail a constant (albeit
>     insignificant) synchronization overhead.
>
>
>
>
>
>     On Tue, Mar 14, 2017 at 3:03 PM, Ankit Malhotra <
> [email protected]>
>     wrote:
>
>     > Hi,
>     >
>     > We are trying to understand metrics that are being populated by our
> samza
>     > job and are a little confused what each of these metrics mean
> especially
>     > since we’re running the job with a thread pool.
>     >
>     >
>     > ·         We have 3 input streams
>     >
>     > ·         job.container.thread.pool.size=3
>     >
>     > ·         1 container per partition
>     >
>     > ·         Using a RocksDB backed store with changelogging
>     >
>     > ·         process-ns = 120,000
>     >
>     > ·         get-ns ~ 30,000
>     >
>     > ·         put-ns ~ 90,000
>     >
>     > ·         block-ns ~ 300,000
>     >
>     > ·         choose-ns ~ 500,000
>     >
>     > Metrics are avg(metric) for all containers/partitions.
>     >
>     > Process-envelopes ~ 7000/sec.
>     >
>     > If I back the math out, this correlates quite closely to process-ns.
>     > (1/7000 ~ 0.15ms).
>     >
>     > What I don’t understand is that the event loop is single threaded.
> Even
>     > though, each process call is a separate thread, the main even loop is
>     > blocking (block-ns) and choosing (choose-ns) every time and these
> times are
>     > quite high. Given these metrics, how is it that we are consistently
> seeing
>     > the above metrics?
>     >
>     > Also, lag (messages behind high watermark) is ~ 0.
>     >
>     > Thanks
>     > Ankit
>     >
>     >
>     >
>     >
>     >
>     >
>
>
>     --
>     Jagadish V,
>     Graduate Student,
>     Department of Computer Science,
>     Stanford University
>
>
>

Re: Understanding metrics

Reply via email to