sanjeet006py commented on PR #2061: URL: https://github.com/apache/phoenix/pull/2061#issuecomment-2677810081
> 1 ms was important for a huge time taken? How huge was the huge time? Extra 1ms was spent in every `executeUpdate()` call and a user was trying to insert 2M rows so, it ended up taking ~26 mins while if we remove that 1ms from `executeUpdate()` then 2M rows can be inserted within 24-25 secs. And, the intention was to ultimately insert 40M rows which would shown regression in hrs. The extra 1ms was being taken to get table metadata from SYSCAT. > Did the 1 ms include any metadata RPCs, and if so, should the metric be captured specifically for calls to SYSTEM.CATALOG or meta or something similar? In this way, we need not spend cycles measuring local work that is expected to be fast. The extra 1ms did come from time spent in call to SYSCAT to get table metadata and we can add a metric to specifically tell us if call to SYSCAT RS was done or not. But in my opinion a regression can be introduced in these paths at any time. This time the regression was extra metadata call but next time it can be something else. So, I think we can capture the time taken by stages to ensure that each stage is behaving as expected. > Is this metric/log only going to be useful in the cases where we send RPCs, or do we think that we really are spending a lot of time locally in planning, without any network calls? I believe this metric can help us highlight any future regressions introduced in the path of mutation planning as this time the first task was to confirm that a regression has been introduced. Currently, because of lack of metrics for mutation planning stages we can't even tell if a regression has been introduced. > Any JVM pause for GC or otherwise could likely last longer than 0.5 milliseconds, so the log message, if that's the choice, shouldn't be misleading that it is some kind of error or egregious performance scenario. Yeah, I plan to log a generic warning message that mutation planning took x time compared to some expected y threshold and I also plan to make that threshold configurable on client side via a config. > I suppose we do want to know metrics around how many RPCs are required to serve the request, especially when those include additional RPCs like system RPCs, which may not always be required. But those are more difficult to instrument, and so we are choosing to instrument mutation planning, only because it's a top-level "span" and we don't need to plumb the instrumentation all the way down through the code? Actually, I didn't thought of adding a metric to measure if RPC call is being done to SYSCAT to get metadata as my aim was to have a way to tell in future that regression has been introduced in mutation planning stage as its a very hot method call. Having such a metric can help quickly root cause. So, once we know that a regression has been introduced in mutation planning then we can check such metric to quickly root cause. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org