[I] [VL] Add details to the "time of input iterator" metric [incubator-gluten]

via GitHub Wed, 03 Sep 2025 09:53:27 -0700


marin-ma opened a new issue, #10618:
URL: https://github.com/apache/incubator-gluten/issues/10618


   ### Description
   
   For the "time of input iterator" metric in the `InputIteratorTransformer`, 
the representation of this time depends on its previous operator.
   
   Below are the 3 different cases that I observe:
   
   1. **When the previous operator is shuffle:**
   The time is primarily the total shuffle read time, including fetch wait time 
and native reader processing time (such as decompression and deserialization).
   2. **When the previous operator is broadcast:**
   The time is nearly zero because the broadcast is already executed before the 
pipeline starts.
   3. **For other cases (e.g., ColumnarUnion or fallback operators within the 
same Spark stage as the previous Velox pipelines):**
   Since wallTimeNanos in Velox is measured by the driver’s getOutput, the time 
of the previous pipelines is included in the getOutput call from the 
ValueStreamNode. In this case, the time of input iterators represents the total 
time counted from the beginning of the current stage.
   
   The discrepancy in behavior across different cases for this operator is not 
documented and may cause confusion for users. It would be better to document 
this and highlight it in the metrics description.
   
   ### Gluten version
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [VL] Add details to the "time of input iterator" metric [incubator-gluten]

Reply via email to