Hi Valentyn,

Here is my pip freeze on my machine (note that the error is in dataflow,
the job runs fine in my machine)

ansiwrap==0.8.4
apache-beam==2.19.0
arrow==0.15.5
asn1crypto==1.3.0
astroid==2.3.3
astropy==3.2.3
attrs==19.3.0
avro-python3==1.9.1
azure-common==1.1.24
azure-storage-blob==2.1.0
azure-storage-common==2.1.0
backcall==0.1.0
bcolz==1.2.1
binaryornot==0.4.4
bleach==3.1.0
boto3==1.11.9
botocore==1.14.9
cachetools==3.1.1
certifi==2019.11.28
cffi==1.13.2
chardet==3.0.4
Click==7.0
cloudpickle==1.2.2
colorama==0.4.3
configparser==4.0.2
confuse==1.0.0
cookiecutter==1.7.0
crcmod==1.7
cryptography==2.8
cycler==0.10.0
daal==2019.0
datalab==1.1.5
decorator==4.4.1
defusedxml==0.6.0
dill==0.3.1.1
distro==1.0.1
docker==4.1.0
docopt==0.6.2
docutils==0.15.2
entrypoints==0.3
enum34==1.1.6
fairing==0.5.3
fastavro==0.21.24
fasteners==0.15
fsspec==0.6.2
future==0.18.2
gcsfs==0.6.0
gitdb2==2.0.6
GitPython==3.0.5
google-api-core==1.16.0
google-api-python-client==1.7.11
google-apitools==0.5.28
google-auth==1.11.0
google-auth-httplib2==0.0.3
google-auth-oauthlib==0.4.1
google-cloud-bigquery==1.17.1
google-cloud-bigtable==1.0.0
google-cloud-core==1.2.0
google-cloud-dataproc==0.6.1
google-cloud-datastore==1.7.4
google-cloud-language==1.3.0
google-cloud-logging==1.14.0
google-cloud-monitoring==0.31.1
google-cloud-pubsub==1.0.2
google-cloud-secret-manager==0.1.1
google-cloud-spanner==1.13.0
google-cloud-storage==1.25.0
google-cloud-translate==2.0.0
google-compute-engine==20191210.0
google-resumable-media==0.4.1
googleapis-common-protos==1.51.0
grpc-google-iam-v1==0.12.3
grpcio==1.26.0
h5py==2.10.0
hdfs==2.5.8
html5lib==1.0.1
htmlmin==0.1.12
httplib2==0.12.0
icc-rt==2020.0.133
idna==2.8
ijson==2.6.1
imageio==2.6.1
importlib-metadata==1.4.0
intel-numpy==1.15.1
intel-openmp==2020.0.133
intel-scikit-learn==0.19.2
intel-scipy==1.1.0
ipykernel==5.1.4
ipython==7.9.0
ipython-genutils==0.2.0
ipython-sql==0.3.9
ipywidgets==7.5.1
isort==4.3.21
jedi==0.16.0
Jinja2==2.11.0
jinja2-time==0.2.0
jmespath==0.9.4
joblib==0.14.1
json5==0.8.5
jsonschema==3.2.0
jupyter==1.0.0
jupyter-aihub-deploy-extension==0.1
jupyter-client==5.3.4
jupyter-console==6.1.0
jupyter-contrib-core==0.3.3
jupyter-contrib-nbextensions==0.5.1
jupyter-core==4.6.1
jupyter-highlight-selected-word==0.2.0
jupyter-http-over-ws==0.0.7
jupyter-latex-envs==1.4.6
jupyter-nbextensions-configurator==0.4.1
jupyterlab==1.2.6
jupyterlab-git==0.9.0
jupyterlab-server==1.0.6
keyring==10.1
keyrings.alt==1.3
kiwisolver==1.1.0
kubernetes==10.0.1
lazy-object-proxy==1.4.3
llvmlite==0.31.0
lxml==4.4.2
Markdown==3.1.1
MarkupSafe==1.1.1
matplotlib==3.0.3
mccabe==0.6.1
missingno==0.4.2
mistune==0.8.4
mkl==2019.0
mkl-fft==1.0.6
mkl-random==1.0.1.1
mock==2.0.0
monotonic==1.5
more-itertools==8.1.0
nbconvert==5.6.1
nbdime==1.1.0
nbformat==5.0.4
networkx==2.4
nltk==3.4.5
notebook==6.0.3
numba==0.47.0
numpy==1.15.1
oauth2client==3.0.0
oauthlib==3.1.0
opencv-python==4.1.2.30
oscrypto==1.2.0
packaging==20.1
pandas==0.25.3
pandas-profiling==1.4.0
pandocfilters==1.4.2
papermill==1.2.1
parso==0.6.0
pathlib2==2.3.5
pbr==5.4.4
pexpect==4.8.0
phik==0.9.8
pickleshare==0.7.5
Pillow-SIMD==6.2.2.post1
pipdeptree==0.13.2
plotly==4.5.0
pluggy==0.13.1
poyo==0.5.0
prettytable==0.7.2
prometheus-client==0.7.1
prompt-toolkit==2.0.10
protobuf==3.11.2
psutil==5.6.7
ptyprocess==0.6.0
py==1.8.1
pyarrow==0.15.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.19
pycrypto==2.6.1
pycryptodomex==3.9.6
pycurl==7.43.0
pydaal==2019.0.0.20180713
pydot==1.4.1
Pygments==2.5.2
pygobject==3.22.0
PyJWT==1.7.1
pylint==2.4.4
pymongo==3.10.1
pyOpenSSL==19.1.0
pyparsing==2.4.6
pyrsistent==0.15.7
pytest==5.3.4
pytest-pylint==0.14.1
python-apt==1.4.1
python-dateutil==2.8.1
pytz==2019.3
PyWavelets==1.1.1
pyxdg==0.25
PyYAML==5.3
pyzmq==18.1.1
qtconsole==4.6.0
requests==2.22.0
requests-oauthlib==1.3.0
retrying==1.3.3
rsa==4.0
s3transfer==0.3.2
scikit-image==0.15.0
scikit-learn==0.19.2
scipy==1.1.0
seaborn==0.9.1
SecretStorage==2.3.1
Send2Trash==1.5.0
simplegeneric==0.8.1
six==1.14.0
smmap2==2.0.5
snowflake-connector-python==2.2.0
SQLAlchemy==1.3.13
sqlparse==0.3.0
tbb==2019.0
tbb4py==2019.0
tenacity==6.0.0
terminado==0.8.3
testpath==0.4.4
textwrap3==0.9.2
tornado==5.1.1
tqdm==4.42.0
traitlets==4.3.3
typed-ast==1.4.1
typing==3.7.4.1
typing-extensions==3.7.4.1
unattended-upgrades==0.1
uritemplate==3.0.1
urllib3==1.24.2
virtualenv==16.7.9
wcwidth==0.1.8
webencodings==0.5.1
websocket-client==0.57.0
Werkzeug==0.16.1
whichcraft==0.6.1
widgetsnbextension==3.5.1
wrapt==1.11.2
zipp==1.1.0


On Tue, Feb 4, 2020 at 11:33 AM Valentyn Tymofieiev <valen...@google.com>
wrote:

> It don't think there is a mismatch between dill versions here, but
> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
>  mentions
> a similar error and may be related. What is the output of pip freeze on
> your machine (or better: pip install pipdeptree; pipdeptree)?
>
>
> On Tue, Feb 4, 2020 at 11:22 AM Alan Krumholz <alan.krumh...@betterup.co>
> wrote:
>
>> Here is a test job that sometimes fails and sometimes doesn't (but most
>> times do).....
>> There seems to be something stochastic that causes this as after several
>> tests a couple of them did succeed....
>>
>>
>> def test_error(
>>     bq_table: str) -> str:
>>
>>     import apache_beam as beam
>>     from apache_beam.options.pipeline_options import PipelineOptions
>>
>>     class GenData(beam.DoFn):
>>         def process(self, _):
>>             for _ in range (20000):
>>                 yield {'a':1,'b':2}
>>
>>
>>     def get_bigquery_schema():
>>         from apache_beam.io.gcp.internal.clients import bigquery
>>
>>         table_schema = bigquery.TableSchema()
>>         columns = [
>>             ["a","integer","nullable"],
>>             ["b","integer","nullable"]
>>         ]
>>
>>         for column in columns:
>>             column_schema = bigquery.TableFieldSchema()
>>             column_schema.name = column[0]
>>             column_schema.type = column[1]
>>             column_schema.mode = column[2]
>>             table_schema.fields.append(column_schema)
>>
>>         return table_schema
>>
>>     pipeline = beam.Pipeline(options=PipelineOptions(
>>         project='my-project',
>>         temp_location = 'gs://my-bucket/temp',
>>         staging_location = 'gs://my-bucket/staging',
>>         runner='DataflowRunner'
>>     ))
>>     #pipeline = beam.Pipeline()
>>
>>     (
>>         pipeline
>>         | 'Empty start' >> beam.Create([''])
>>         | 'Generate Data' >> beam.ParDo(GenData())
>>         #| 'print' >> beam.Map(print)
>>         | 'Write to BigQuery' >> beam.io.WriteToBigQuery(
>>                     project=bq_table.split(':')[0],
>>                     dataset=bq_table.split(':')[1].split('.')[0],
>>                     table=bq_table.split(':')[1].split('.')[1],
>>                     schema=get_bigquery_schema(),
>>
>> create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>>
>> write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)
>>     )
>>
>>     result = pipeline.run()
>>     result.wait_until_finish()
>>
>>     return True
>>
>> test_error(
>>     bq_table = 'my-project:my_dataset.my_table'
>> )
>>
>> On Tue, Feb 4, 2020 at 10:04 AM Alan Krumholz <alan.krumh...@betterup.co>
>> wrote:
>>
>>> I tried breaking apart my pipeline. Seems the step that breaks it is:
>>> beam.io.WriteToBigQuery
>>>
>>> Let me see if I can create a self contained example that breaks to share
>>> with you
>>>
>>> Thanks!
>>>
>>> On Tue, Feb 4, 2020 at 9:53 AM Pablo Estrada <pabl...@google.com> wrote:
>>>
>>>> Hm that's odd. No changes to the pipeline? Are you able to share some
>>>> of the code?
>>>>
>>>> +Udi Meiri <eh...@google.com> do you have any idea what could be going
>>>> on here?
>>>>
>>>> On Tue, Feb 4, 2020 at 9:25 AM Alan Krumholz <alan.krumh...@betterup.co>
>>>> wrote:
>>>>
>>>>> Hi Pablo,
>>>>> This is strange... it doesn't seem to be the last beam release as last
>>>>> night it was already using 2.19.0 I wonder if it was some release from the
>>>>> DataFlow team (not beam related):
>>>>> Job typeBatch
>>>>> Job status Succeeded
>>>>> SDK version
>>>>> Apache Beam Python 3.5 SDK 2.19.0
>>>>> Region
>>>>> us-central1
>>>>> Start timeFebruary 3, 2020 at 9:28:35 PM GMT-8
>>>>> Elapsed time5 min 11 sec
>>>>>
>>>>> On Tue, Feb 4, 2020 at 9:15 AM Pablo Estrada <pabl...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Alan,
>>>>>> could it be that you're picking up the new Apache Beam 2.19.0
>>>>>> release? Could you try depending on beam 2.18.0 to see if the issue
>>>>>> surfaces when using the new release?
>>>>>>
>>>>>> If something was working and no longer works, it sounds like a bug.
>>>>>> This may have to do with how we pickle (dill / cloudpickle) - see this
>>>>>> question
>>>>>> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
>>>>>> Best
>>>>>> -P.
>>>>>>
>>>>>> On Tue, Feb 4, 2020 at 6:22 AM Alan Krumholz <
>>>>>> alan.krumh...@betterup.co> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was running a dataflow job in GCP last night and it was running
>>>>>>> fine.
>>>>>>> This morning this same exact job is failing with the following error:
>>>>>>>
>>>>>>> Error message from worker: Traceback (most recent call last): File
>>>>>>> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py",
>>>>>>> line 286, in loads return dill.loads(s) File
>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, in 
>>>>>>> loads
>>>>>>> return load(file, ignore, **kwds) File
>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, in 
>>>>>>> load
>>>>>>> return Unpickler(file, ignore=ignore, **kwds).load() File
>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, in 
>>>>>>> load
>>>>>>> obj = StockUnpickler.load(self) File
>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, in
>>>>>>> _load_type return _reverse_typemap[name] KeyError: 'ClassType' During
>>>>>>> handling of the above exception, another exception occurred: Traceback
>>>>>>> (most recent call last): File
>>>>>>> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py",
>>>>>>> line 648, in do_work work_executor.execute() File
>>>>>>> "/usr/local/lib/python3.5/site-packages/dataflow_worker/executor.py", 
>>>>>>> line
>>>>>>> 176, in execute op.start() File 
>>>>>>> "apache_beam/runners/worker/operations.py",
>>>>>>> line 649, in apache_beam.runners.worker.operations.DoOperation.start 
>>>>>>> File
>>>>>>> "apache_beam/runners/worker/operations.py", line 651, in
>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File
>>>>>>> "apache_beam/runners/worker/operations.py", line 652, in
>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File
>>>>>>> "apache_beam/runners/worker/operations.py", line 261, in
>>>>>>> apache_beam.runners.worker.operations.Operation.start File
>>>>>>> "apache_beam/runners/worker/operations.py", line 266, in
>>>>>>> apache_beam.runners.worker.operations.Operation.start File
>>>>>>> "apache_beam/runners/worker/operations.py", line 597, in
>>>>>>> apache_beam.runners.worker.operations.DoOperation.setup File
>>>>>>> "apache_beam/runners/worker/operations.py", line 602, in
>>>>>>> apache_beam.runners.worker.operations.DoOperation.setup File
>>>>>>> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py",
>>>>>>> line 290, in loads return dill.loads(s) File
>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, in 
>>>>>>> loads
>>>>>>> return load(file, ignore, **kwds) File
>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, in 
>>>>>>> load
>>>>>>> return Unpickler(file, ignore=ignore, **kwds).load() File
>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, in 
>>>>>>> load
>>>>>>> obj = StockUnpickler.load(self) File
>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, in
>>>>>>> _load_type return _reverse_typemap[name] KeyError: 'ClassType'
>>>>>>>
>>>>>>>
>>>>>>> If I use a local runner it still runs fine.
>>>>>>> Anyone else experiencing something similar today? (or know how to
>>>>>>> fix this?)
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>

Reply via email to