[
https://issues.apache.org/jira/browse/BEAM-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zach Moshe updated BEAM-5805:
-----------------------------
Description:
Consider the following pipeline:
{{with beam.Pipeline(..) as p: }}
{{ res = p | ... | beam.Partition(..)}}
When res is an `_apache_beam.pvalue.DoOutputsTuple_`, it supports access by
`res[0]` and `res["0"]`. However, if res is a
`_apache_beam.transforms.ptransform._MaterializedDoOutputsTuple_', integer
access isn't supported and we must access as strings, although not very
intuitive considering that `_partition_fn_` returns integers.
I'm not familiar with beam internals but briefly looked into the code and I saw
that __MaterializedDoOutputsTuple overrides the __getitem__() of DoOutputsTuple
and doesn't have the explicit casting
([https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pvalue.py#L225).]
Also looks like [~gildea] already had a related comment there.
Is this on purpose? Can I expect an access-by-int API for Partition() results
regardless of whether it was materialized or not?
was:
Consider the following pipeline:
{{with beam.Pipeline(..) as p: }}
{{ res = p | ... | beam.Partition(..)}}
When res is an `_apache_beam.pvalue.DoOutputsTuple_`, it supports access by
`res[0]` and `res["0"]`. However, if res is a
`_apache_beam.transforms.ptransform._MaterializedDoOutputsTuple_', integer
access isn't supported and we must access as strings, although not very
intuitive considering that `_partition_fn_` returns integers.
I'm not familiar with beam internals but briefly looked into the code and I saw
that __MaterializedDoOutputsTuple overrides the __getitem__() of DoOutputsTuple
and doesn't have the explicit casting
([https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pvalue.py#L225).]
Also looks like [~gildea] already had a related comment there.
Is this on purpose? Can I expect an access-by-int API for Partition() results
regardless of whether it was materialized or not?
> _MaterializedDoOutputsTuple doesn't support __getitem__ by integer values
> -------------------------------------------------------------------------
>
> Key: BEAM-5805
> URL: https://issues.apache.org/jira/browse/BEAM-5805
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Zach Moshe
> Assignee: Ahmet Altay
> Priority: Major
>
> Consider the following pipeline:
> {{with beam.Pipeline(..) as p: }}
> {{ res = p | ... | beam.Partition(..)}}
>
> When res is an `_apache_beam.pvalue.DoOutputsTuple_`, it supports access by
> `res[0]` and `res["0"]`. However, if res is a
> `_apache_beam.transforms.ptransform._MaterializedDoOutputsTuple_', integer
> access isn't supported and we must access as strings, although not very
> intuitive considering that `_partition_fn_` returns integers.
>
> I'm not familiar with beam internals but briefly looked into the code and I
> saw that __MaterializedDoOutputsTuple overrides the __getitem__() of
> DoOutputsTuple and doesn't have the explicit casting
> ([https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pvalue.py#L225).]
> Also looks like [~gildea] already had a related comment there.
>
> Is this on purpose? Can I expect an access-by-int API for Partition() results
> regardless of whether it was materialized or not?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)