[ 
https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=301620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301620
 ]

ASF GitHub Bot logged work on BEAM-7984:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Aug/19 01:36
            Start Date: 27/Aug/19 01:36
    Worklog Time Spent: 10m 
      Work Description: chadrik commented on issue #9344: [BEAM-7984] The coder 
returned for typehints.List should be IterableCoder
URL: https://github.com/apache/beam/pull/9344#issuecomment-525098325
 
 
   > If the user specifies the type as List, should we ensure that we always 
return a List? (I.e should we explicitly state our assumption here that 
IterableCoder returns a list and also adda note in the implementation of 
IterableCoderImpl that if this changes we need to create a separate ListCoder? 
Or should we go ahead and create a separate ListCoder now?
   
   I dug into this a bit more and realized that I was mistaken about my 
assertion that `IterableCoderImpl.decode` always return a list.  Here's the 
relevant code from the base class:
   
   ```python
   class SequenceCoderImpl(StreamCoderImpl):
   
     def __init__(self, elem_coder,
                  read_state=None, write_state=None, write_state_threshold=0):
       self._elem_coder = elem_coder
       self._read_state = read_state
       self._write_state = write_state
       self._write_state_threshold = write_state_threshold
   
     ...
   
     def decode_from_stream(self, in_stream, nested):
       size = in_stream.read_bigendian_int32()
   
       if size >= 0:
         elements = [self._elem_coder.decode_from_stream(in_stream, True)
                     for _ in range(size)]
       else:
         elements = []
         count = in_stream.read_var_int64()
         while count > 0:
           for _ in range(count):
             elements.append(self._elem_coder.decode_from_stream(in_stream, 
True))
           count = in_stream.read_var_int64()
   
         if count == -1:
           if self._read_state is None:
             raise ValueError(
                 'Cannot read state-written iterable without state reader.')
   
           state_token = in_stream.read_all(True)
           elements = _ConcatSequence(
               elements, self._read_state(state_token, self._elem_coder))
   
       return self._construct_from_sequence(elements)
   ```
   
   `_ConcatSequence` and `self._read_state` are both iterators. 
   
   Some options:
   
   - create a `ListCoder`:   the problem is this won't be portable, which 
defeats the original motivation here, for external transforms
   - make `IterableCoder` always return a `list`:   It currently returns an 
iterator in the (rare?) case that `read_state` is provided, in which case it 
_seems_ like it progressively yields elements as they are decoded, but I'm not 
100% clear on that.  If that _is_ the case, converting to a list on decode 
would undermine the usefulness of this feature. 
   - leave it as is:  `list` is iterable after all, and in the case that the 
user provides a list to encode, they will get a list back, since there 
shouldn't be a `read_state`. 
   
   So as it is, `IterableCoder` is true to its name in that it really only 
guarantees that the decoded value is iterable.  However, as the tests that I 
added imply, in the typical case, the result is actually a list.  So I think 
the best thing to do is to for me to update the test that I added to assert 
that the returned type is a list, so that if that ever changes, someone is 
forced to consider it.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 301620)
    Time Spent: 0.5h  (was: 20m)

> [python] The coder returned for typehints.List should be IterableCoder
> ----------------------------------------------------------------------
>
>                 Key: BEAM-7984
>                 URL: https://issues.apache.org/jira/browse/BEAM-7984
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Chad Dombrova
>            Assignee: Chad Dombrova
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> IterableCoder encodes a list and decodes to list, but 
> typecoders.registry.get_coder(typehints.List[bytes]) returns a 
> FastPrimitiveCoder.  I don't see any reason why this would be advantageous. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to