That seems like the correct fix as well. We could open up a PR and see what
the tests catch as a first pass for understanding the implications.

On Fri, May 10, 2019 at 9:31 AM Jan Lukavský <je...@seznam.cz> wrote:

> Hm, yes, the fix might be also in fixing hashCode and equals of
> SimpleStateTag, so that it doesn't hash and compare the StateSpec, but only
> the StructureId. That looks like best option to me. But I'm not sure about
> other implications this might have.
>
> Jan
> On 5/10/19 5:43 PM, Reuven Lax wrote:
>
> Ok so this sounds like a bug in the DirectRunner then?
>
> *From: *Lukasz Cwik <lc...@google.com>
> *Date: *Fri, May 10, 2019 at 8:38 AM
> *To: *dev
>
> StateSpec should not be used as a key within any maps. We should use the
>> logical name of the StateSpec relative to the DoFn as its id and should
>> only be using that id for comparisons/lookups.
>>
>> On Fri, May 10, 2019 at 1:07 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> I'm not sure. Generally it affects any runner that uses HashMap to store
>>> StateSpec.
>>>
>>> Jan
>>> On 5/9/19 6:32 PM, Reuven Lax wrote:
>>>
>>> Is this specific to the DirectRunner, or does it affect other runners?
>>>
>>> On Thu, May 9, 2019 at 8:13 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>
>>>> Because of the use of hashCode in StateSpecs, I'd say that it is. But
>>>> it is not obvious. That's why I'd suggest to make it abstract on Coder, so
>>>> that all implementations have to override it. That's a simple solution, but
>>>> the question is - should hashCode of Coder be used that way? I think that
>>>> StateSpec instances should be equal only to itself. Then the hashCode can
>>>> be stored in the instance, e.g.
>>>>
>>>>   private final int hashCode = System.identityHashCode(this)
>>>>
>>>> and returned in hashCode(). There would be no need for Coder to
>>>> implement hashCode anymore (if there aren't any other cases, where it is
>>>> needed, in which case it would still be better to add abstract hashCode and
>>>> equals methods on Coder).
>>>>
>>>> Jan
>>>> On 5/9/19 5:04 PM, Reuven Lax wrote:
>>>>
>>>> Is a valid hashCode on Coder part of our contract or not? If it is,
>>>> then the lack of hashCode on SchemaCoder is simply a bug.
>>>>
>>>> On Thu, May 9, 2019 at 7:42 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have spent several hour digging into strange issue with
>>>>> DirectRunner,
>>>>> that manifested as non-deterministic run of pipeline. The pipeline
>>>>> contains basically only single stateful ParDo, which adds elements
>>>>> into
>>>>> state and after some timeout flushes these elements into output. The
>>>>> issues was, that sometimes (very often) when the timer fired, the
>>>>> state
>>>>> appeared to be empty, although I actually added something into the
>>>>> state. I will skip details, but the problem boils down to the fact,
>>>>> that
>>>>> StateSpecs hash Coder into hashCode - e.g.
>>>>>
>>>>>      @Override
>>>>>      public int hashCode() {
>>>>>        return Objects.hash(getClass(), coder);
>>>>>      }
>>>>>
>>>>> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
>>>>> implemented (and there are some of those in the codebase itself - e.g.
>>>>> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
>>>>> proposal is - either to add abstract hashCode and equals to Coder, or
>>>>> don't hash the Coder into hashCode of StateSpecs (we can generate
>>>>> unique
>>>>> ID for each StateSpec instance for example).
>>>>>
>>>>> Any thoughts about which path to follow? Or maybe both? :)
>>>>>
>>>>> Jan
>>>>>
>>>>>
>>>>>

Reply via email to