[
https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=302431&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302431
]
ASF GitHub Bot logged work on BEAM-7984:
----------------------------------------
Author: ASF GitHub Bot
Created on: 27/Aug/19 22:00
Start Date: 27/Aug/19 22:00
Worklog Time Spent: 10m
Work Description: chadrik commented on issue #9344: [BEAM-7984] The coder
returned for typehints.List should be IterableCoder
URL: https://github.com/apache/beam/pull/9344#issuecomment-525098325
> If the user specifies the type as List, should we ensure that we always
return a List? (I.e should we explicitly state our assumption here that
IterableCoder returns a list and also adda note in the implementation of
IterableCoderImpl that if this changes we need to create a separate ListCoder?
Or should we go ahead and create a separate ListCoder now?
I dug into this a bit more and realized that I was mistaken about my
assertion that `IterableCoderImpl.decode` always return a list. Here's the
relevant code from the base class:
```python
class SequenceCoderImpl(StreamCoderImpl):
def __init__(self, elem_coder,
read_state=None, write_state=None, write_state_threshold=0):
self._elem_coder = elem_coder
self._read_state = read_state
self._write_state = write_state
self._write_state_threshold = write_state_threshold
...
def decode_from_stream(self, in_stream, nested):
size = in_stream.read_bigendian_int32()
if size >= 0:
elements = [self._elem_coder.decode_from_stream(in_stream, True)
for _ in range(size)]
else:
elements = []
count = in_stream.read_var_int64()
while count > 0:
for _ in range(count):
elements.append(self._elem_coder.decode_from_stream(in_stream,
True))
count = in_stream.read_var_int64()
if count == -1:
if self._read_state is None:
raise ValueError(
'Cannot read state-written iterable without state reader.')
state_token = in_stream.read_all(True)
elements = _ConcatSequence(
elements, self._read_state(state_token, self._elem_coder))
return self._construct_from_sequence(elements)
```
`_ConcatSequence` and `self._read_state` are both iterators.
Some options:
- create a `ListCoder`: the problem is this won't be portable, which
defeats the original motivation here, for external transforms
- make `IterableCoder` always return a `list`: It usually returns a list,
but returns an iterator if a `read_state` is provided (not sure when this
happens). In that case it _seems_ like it progressively yields elements as
they are decoded, but I'm not 100% clear on that. If that _is_ the case,
converting to a list on decode would undermine the usefulness of this feature.
- leave it as is: `list` is iterable after all, and in the case that the
user provides a list to encode, they will get a list back, since there
shouldn't be a `read_state` in that case.
So as it is, `IterableCoder` is true to its name in that it really only
guarantees that the decoded value is iterable. However, as the tests that I
added imply, in the typical case, the result is actually a list. So I think
the best thing to do is for me to update the test that I added to assert that
the returned type is a list, so that if that ever changes, someone is forced to
consider it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 302431)
Time Spent: 40m (was: 0.5h)
> [python] The coder returned for typehints.List should be IterableCoder
> ----------------------------------------------------------------------
>
> Key: BEAM-7984
> URL: https://issues.apache.org/jira/browse/BEAM-7984
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Chad Dombrova
> Assignee: Chad Dombrova
> Priority: Major
> Time Spent: 40m
> Remaining Estimate: 0h
>
> IterableCoder encodes a list and decodes to list, but
> typecoders.registry.get_coder(typehints.List[bytes]) returns a
> FastPrimitiveCoder. I don't see any reason why this would be advantageous.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)