[
https://issues.apache.org/jira/browse/BEAM-11984?focusedWorklogId=598940&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598940
]
ASF GitHub Bot logged work on BEAM-11984:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 19/May/21 00:20
Start Date: 19/May/21 00:20
Worklog Time Spent: 10m
Work Description: ajamato commented on pull request #14770:
URL: https://github.com/apache/beam/pull/14770#issuecomment-843651996
I am not sure why but this doesn't show up in my github mentions or reviews.
Please DM me if you need me to look at the PR
> Should all the`self._client.objects.Get()` calls be added to the metrics
or just the ones pointed in the document? Like this
[function](https://github.com/apache/beam/blob/309fc99a8a94a8dc42a4e817002cc084da5a2811/sdks/python/apache_beam/io/gcp/gcsio.py#L593),
that also makes this request and it's not pointed in the document.
Anywhere the GCS IO reads and writes objects to GCS needs instrumentation. I
don't think I identified all the locations, please see if you can locate them
all.
> In the implementation guide, a reference is made to the
`GcsUtil.java.getObject`[1]. However, I'm not sure if the metrics should be
added in the python's code equivalent (which I think is this[2]) or in this
specific piece of code[1].
>
> [1]
[GcsUtil.java#L286](https://github.com/apache/beam/blob/3bb232fb098700de408f574585dfe74bbaff7230/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L286)
> [1]
[gcsio.py#L613](https://github.com/apache/beam/blob/920553e8f2743d2709b786c16a2f916a2a8c9389/sdks/python/apache_beam/io/gcp/gcsio.py#L613)
Which ever is the appropriate code that GCSIO used to read and write objects
to GCS.
I would run a pipeline on direct runner, and add logging to identify.
Look if there are extra modes that might cause it to use a different code
path as well
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 598940)
Time Spent: 3h 20m (was: 3h 10m)
> Python GCS - Implement IO Request Count metrics
> -----------------------------------------------
>
> Key: BEAM-11984
> URL: https://issues.apache.org/jira/browse/BEAM-11984
> Project: Beam
> Issue Type: Test
> Components: io-py-gcp
> Reporter: Alex Amato
> Assignee: Rogelio Miguel Hernandez Sandoval
> Priority: P2
> Time Spent: 3h 20m
> Remaining Estimate: 0h
>
> Reference PRs (See BigQuery IO example) and detailed explanation of what's
> needed to instrument this IO with Request Count metrics is found in this
> handoff doc:
> [https://docs.google.com/document/d/1lrz2wE5Dl4zlUfPAenjXIQyleZvqevqoxhyE85aj4sc/edit'|https://docs.google.com/document/d/1lrz2wE5Dl4zlUfPAenjXIQyleZvqevqoxhyE85aj4sc/edit'?authuser=0]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)