[ 
https://issues.apache.org/jira/browse/BEAM-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-6969:
--------------------------------
    Description: 
Currently, IO tests measure time using Metrics API but collect start/end time 
from ParDo transforms that are adjacent to the IO. It's fine for some tests but 
maybe could be done better. The drawback of the current solution is that we 
cannot collect time before PBegin and after PDone. Other than that the time we 
collect now is still not the exact time of read/write start/end but only the 
time at which first/last record appeared in the DoFn.

See: 
[TimeMonitor.java|https://github.com/apache/beam/blob/957b7cc7746aa626d2eb4dea341f668ec19d5d39/sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/metrics/TimeMonitor.java]
 as an example of such DoFn.

Possible solution: save metrics in startBundle / finishBundle method in IOs 
whenever a dedicated pipelineOption is set to true. 

In general, maybe it's a good idea to place some other metrics inside IOs too? 
wdyt?

  was:
Currently, IO tests measure time using Metrics API but collect start/end time 
from ParDo transforms that are adjacent to the IO. It's fine for some tests but 
maybe could be done better. The drawback of the current solution is that we 
cannot collect time before PBegin and after PDone. Other than that the time we 
collect now is still not the exact time of read/write start/end but only the 
time at which first/last record appeared in the DoFn.

See: 
[TimeMonitor.java|https://github.com/apache/beam/blob/957b7cc7746aa626d2eb4dea341f668ec19d5d39/sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/metrics/TimeMonitor.java]
 as an example of such DoFn.

Possible solution: save metrics in startBundle / finishBundle method in IOs 
whenever a dedicated pipelineOption is set to true. 


> Provide way to collect start/end read/write time inside the IOs
> ---------------------------------------------------------------
>
>                 Key: BEAM-6969
>                 URL: https://issues.apache.org/jira/browse/BEAM-6969
>             Project: Beam
>          Issue Type: Wish
>          Components: io-ideas, testing
>            Reporter: Lukasz Gajowy
>            Priority: Minor
>
> Currently, IO tests measure time using Metrics API but collect start/end time 
> from ParDo transforms that are adjacent to the IO. It's fine for some tests 
> but maybe could be done better. The drawback of the current solution is that 
> we cannot collect time before PBegin and after PDone. Other than that the 
> time we collect now is still not the exact time of read/write start/end but 
> only the time at which first/last record appeared in the DoFn.
> See: 
> [TimeMonitor.java|https://github.com/apache/beam/blob/957b7cc7746aa626d2eb4dea341f668ec19d5d39/sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/metrics/TimeMonitor.java]
>  as an example of such DoFn.
> Possible solution: save metrics in startBundle / finishBundle method in IOs 
> whenever a dedicated pipelineOption is set to true. 
> In general, maybe it's a good idea to place some other metrics inside IOs 
> too? wdyt?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to