[ https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=301620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301620 ]
ASF GitHub Bot logged work on BEAM-7984: ---------------------------------------- Author: ASF GitHub Bot Created on: 27/Aug/19 01:36 Start Date: 27/Aug/19 01:36 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9344: [BEAM-7984] The coder returned for typehints.List should be IterableCoder URL: https://github.com/apache/beam/pull/9344#issuecomment-525098325 > If the user specifies the type as List, should we ensure that we always return a List? (I.e should we explicitly state our assumption here that IterableCoder returns a list and also adda note in the implementation of IterableCoderImpl that if this changes we need to create a separate ListCoder? Or should we go ahead and create a separate ListCoder now? I dug into this a bit more and realized that I was mistaken about my assertion that `IterableCoderImpl.decode` always return a list. Here's the relevant code from the base class: ```python class SequenceCoderImpl(StreamCoderImpl): def __init__(self, elem_coder, read_state=None, write_state=None, write_state_threshold=0): self._elem_coder = elem_coder self._read_state = read_state self._write_state = write_state self._write_state_threshold = write_state_threshold ... def decode_from_stream(self, in_stream, nested): size = in_stream.read_bigendian_int32() if size >= 0: elements = [self._elem_coder.decode_from_stream(in_stream, True) for _ in range(size)] else: elements = [] count = in_stream.read_var_int64() while count > 0: for _ in range(count): elements.append(self._elem_coder.decode_from_stream(in_stream, True)) count = in_stream.read_var_int64() if count == -1: if self._read_state is None: raise ValueError( 'Cannot read state-written iterable without state reader.') state_token = in_stream.read_all(True) elements = _ConcatSequence( elements, self._read_state(state_token, self._elem_coder)) return self._construct_from_sequence(elements) ``` `_ConcatSequence` and `self._read_state` are both iterators. Some options: - create a `ListCoder`: the problem is this won't be portable, which defeats the original motivation here, for external transforms - make `IterableCoder` always return a `list`: It currently returns an iterator in the (rare?) case that `read_state` is provided, in which case it _seems_ like it progressively yields elements as they are decoded, but I'm not 100% clear on that. If that _is_ the case, converting to a list on decode would undermine the usefulness of this feature. - leave it as is: `list` is iterable after all, and in the case that the user provides a list to encode, they will get a list back, since there shouldn't be a `read_state`. So as it is, `IterableCoder` is true to its name in that it really only guarantees that the decoded value is iterable. However, as the tests that I added imply, in the typical case, the result is actually a list. So I think the best thing to do is to for me to update the test that I added to assert that the returned type is a list, so that if that ever changes, someone is forced to consider it. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 301620) Time Spent: 0.5h (was: 20m) > [python] The coder returned for typehints.List should be IterableCoder > ---------------------------------------------------------------------- > > Key: BEAM-7984 > URL: https://issues.apache.org/jira/browse/BEAM-7984 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core > Reporter: Chad Dombrova > Assignee: Chad Dombrova > Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > IterableCoder encodes a list and decodes to list, but > typecoders.registry.get_coder(typehints.List[bytes]) returns a > FastPrimitiveCoder. I don't see any reason why this would be advantageous. -- This message was sent by Atlassian Jira (v8.3.2#803003)