Hi, Folks, I am running python SDK PortableRunner, by connecting to Java Reference Runner Job server. But we couldn't make it work because docker container fails to start due to error message: "2018/11/16 21:38:55 Failed to retrieve staged files: failed to retrieve pickled_main_session in 3 attempts: bad MD5 for /tmp/staged/pickled_main_session: 9g/EU11J0QTfwDVbpHQhAQ==, want ; bad MD5 for /tmp/staged/pickled_main_session: 9g/EU11J0QTfwDVbpHQhAQ==, want ; bad MD5 for /tmp/staged/pickled_main_session: 9g/EU11J0QTfwDVbpHQhAQ==, want ; bad MD5 for /tmp/staged/pickled_main_session: 9g/EU11J0QTfwDVbpHQhAQ==, want ". Actual code for this error message is here <https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/artifact/materialize.go#L173> .
The file pickled_main_session is INDEED staged, but for unknown reason we are expecting an empty string as the hash code. My hypothesis is that, the job request should've included a hash code, but fails to do so on the python part, thus led to an empty string. If the hypothesis above is correct, then my question is: where should I put the code in python SDK's job request to make it right? A pointer to the right place is appreciated. That being said, I also saw Ankur's recent PR#7049 <https://github.com/apache/beam/commit/1b241f9517342c73ed2f0a73251858ee67c7e191> updates MD5 into SHA256. And this PR we are not updating anything in Java or Python. Therefore it makes me not sure about the hypothesis above. What did I miss? (or maybe that is what PR#7049 should've done?) Suggestions appreciated. Cheers, -- ================ Ruoyun Huang