Hi,

I have spent several hour digging into strange issue with DirectRunner, that manifested as non-deterministic run of pipeline. The pipeline contains basically only single stateful ParDo, which adds elements into state and after some timeout flushes these elements into output. The issues was, that sometimes (very often) when the timer fired, the state appeared to be empty, although I actually added something into the state. I will skip details, but the problem boils down to the fact, that StateSpecs hash Coder into hashCode - e.g.

    @Override
    public int hashCode() {
      return Objects.hash(getClass(), coder);
    }

in ValueStateSpec. Now, when Coder doesn't have hashCode and equals implemented (and there are some of those in the codebase itself - e.g. SchemaCoder), it all blows up in a very hard-to-debug manner. So the proposal is - either to add abstract hashCode and equals to Coder, or don't hash the Coder into hashCode of StateSpecs (we can generate unique ID for each StateSpec instance for example).

Any thoughts about which path to follow? Or maybe both? :)

Jan


Reply via email to