I'm using a managed notebook instance from GCP
It seems those already come with cloudpickle==1.2.2 as soon as you
provision it. apache-beam[gcp] will then install dill==0.3.1.1 I'm going to
try to uninstall cloudpickle before installing apache-beam and see if this
fixes the problem

Thank you

On Tue, Feb 4, 2020 at 11:54 AM Valentyn Tymofieiev <valen...@google.com>
wrote:

> The fact that you have cloudpickle==1.2.2 further confirms that you may
> be hitting the same error as
> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
>  .
>
> Could you try to start over with a clean virtual environment?
>
> On Tue, Feb 4, 2020 at 11:46 AM Alan Krumholz <alan.krumh...@betterup.co>
> wrote:
>
>> Hi Valentyn,
>>
>> Here is my pip freeze on my machine (note that the error is in dataflow,
>> the job runs fine in my machine)
>>
>> ansiwrap==0.8.4
>> apache-beam==2.19.0
>> arrow==0.15.5
>> asn1crypto==1.3.0
>> astroid==2.3.3
>> astropy==3.2.3
>> attrs==19.3.0
>> avro-python3==1.9.1
>> azure-common==1.1.24
>> azure-storage-blob==2.1.0
>> azure-storage-common==2.1.0
>> backcall==0.1.0
>> bcolz==1.2.1
>> binaryornot==0.4.4
>> bleach==3.1.0
>> boto3==1.11.9
>> botocore==1.14.9
>> cachetools==3.1.1
>> certifi==2019.11.28
>> cffi==1.13.2
>> chardet==3.0.4
>> Click==7.0
>> cloudpickle==1.2.2
>> colorama==0.4.3
>> configparser==4.0.2
>> confuse==1.0.0
>> cookiecutter==1.7.0
>> crcmod==1.7
>> cryptography==2.8
>> cycler==0.10.0
>> daal==2019.0
>> datalab==1.1.5
>> decorator==4.4.1
>> defusedxml==0.6.0
>> dill==0.3.1.1
>> distro==1.0.1
>> docker==4.1.0
>> docopt==0.6.2
>> docutils==0.15.2
>> entrypoints==0.3
>> enum34==1.1.6
>> fairing==0.5.3
>> fastavro==0.21.24
>> fasteners==0.15
>> fsspec==0.6.2
>> future==0.18.2
>> gcsfs==0.6.0
>> gitdb2==2.0.6
>> GitPython==3.0.5
>> google-api-core==1.16.0
>> google-api-python-client==1.7.11
>> google-apitools==0.5.28
>> google-auth==1.11.0
>> google-auth-httplib2==0.0.3
>> google-auth-oauthlib==0.4.1
>> google-cloud-bigquery==1.17.1
>> google-cloud-bigtable==1.0.0
>> google-cloud-core==1.2.0
>> google-cloud-dataproc==0.6.1
>> google-cloud-datastore==1.7.4
>> google-cloud-language==1.3.0
>> google-cloud-logging==1.14.0
>> google-cloud-monitoring==0.31.1
>> google-cloud-pubsub==1.0.2
>> google-cloud-secret-manager==0.1.1
>> google-cloud-spanner==1.13.0
>> google-cloud-storage==1.25.0
>> google-cloud-translate==2.0.0
>> google-compute-engine==20191210.0
>> google-resumable-media==0.4.1
>> googleapis-common-protos==1.51.0
>> grpc-google-iam-v1==0.12.3
>> grpcio==1.26.0
>> h5py==2.10.0
>> hdfs==2.5.8
>> html5lib==1.0.1
>> htmlmin==0.1.12
>> httplib2==0.12.0
>> icc-rt==2020.0.133
>> idna==2.8
>> ijson==2.6.1
>> imageio==2.6.1
>> importlib-metadata==1.4.0
>> intel-numpy==1.15.1
>> intel-openmp==2020.0.133
>> intel-scikit-learn==0.19.2
>> intel-scipy==1.1.0
>> ipykernel==5.1.4
>> ipython==7.9.0
>> ipython-genutils==0.2.0
>> ipython-sql==0.3.9
>> ipywidgets==7.5.1
>> isort==4.3.21
>> jedi==0.16.0
>> Jinja2==2.11.0
>> jinja2-time==0.2.0
>> jmespath==0.9.4
>> joblib==0.14.1
>> json5==0.8.5
>> jsonschema==3.2.0
>> jupyter==1.0.0
>> jupyter-aihub-deploy-extension==0.1
>> jupyter-client==5.3.4
>> jupyter-console==6.1.0
>> jupyter-contrib-core==0.3.3
>> jupyter-contrib-nbextensions==0.5.1
>> jupyter-core==4.6.1
>> jupyter-highlight-selected-word==0.2.0
>> jupyter-http-over-ws==0.0.7
>> jupyter-latex-envs==1.4.6
>> jupyter-nbextensions-configurator==0.4.1
>> jupyterlab==1.2.6
>> jupyterlab-git==0.9.0
>> jupyterlab-server==1.0.6
>> keyring==10.1
>> keyrings.alt==1.3
>> kiwisolver==1.1.0
>> kubernetes==10.0.1
>> lazy-object-proxy==1.4.3
>> llvmlite==0.31.0
>> lxml==4.4.2
>> Markdown==3.1.1
>> MarkupSafe==1.1.1
>> matplotlib==3.0.3
>> mccabe==0.6.1
>> missingno==0.4.2
>> mistune==0.8.4
>> mkl==2019.0
>> mkl-fft==1.0.6
>> mkl-random==1.0.1.1
>> mock==2.0.0
>> monotonic==1.5
>> more-itertools==8.1.0
>> nbconvert==5.6.1
>> nbdime==1.1.0
>> nbformat==5.0.4
>> networkx==2.4
>> nltk==3.4.5
>> notebook==6.0.3
>> numba==0.47.0
>> numpy==1.15.1
>> oauth2client==3.0.0
>> oauthlib==3.1.0
>> opencv-python==4.1.2.30
>> oscrypto==1.2.0
>> packaging==20.1
>> pandas==0.25.3
>> pandas-profiling==1.4.0
>> pandocfilters==1.4.2
>> papermill==1.2.1
>> parso==0.6.0
>> pathlib2==2.3.5
>> pbr==5.4.4
>> pexpect==4.8.0
>> phik==0.9.8
>> pickleshare==0.7.5
>> Pillow-SIMD==6.2.2.post1
>> pipdeptree==0.13.2
>> plotly==4.5.0
>> pluggy==0.13.1
>> poyo==0.5.0
>> prettytable==0.7.2
>> prometheus-client==0.7.1
>> prompt-toolkit==2.0.10
>> protobuf==3.11.2
>> psutil==5.6.7
>> ptyprocess==0.6.0
>> py==1.8.1
>> pyarrow==0.15.1
>> pyasn1==0.4.8
>> pyasn1-modules==0.2.8
>> pycparser==2.19
>> pycrypto==2.6.1
>> pycryptodomex==3.9.6
>> pycurl==7.43.0
>> pydaal==2019.0.0.20180713
>> pydot==1.4.1
>> Pygments==2.5.2
>> pygobject==3.22.0
>> PyJWT==1.7.1
>> pylint==2.4.4
>> pymongo==3.10.1
>> pyOpenSSL==19.1.0
>> pyparsing==2.4.6
>> pyrsistent==0.15.7
>> pytest==5.3.4
>> pytest-pylint==0.14.1
>> python-apt==1.4.1
>> python-dateutil==2.8.1
>> pytz==2019.3
>> PyWavelets==1.1.1
>> pyxdg==0.25
>> PyYAML==5.3
>> pyzmq==18.1.1
>> qtconsole==4.6.0
>> requests==2.22.0
>> requests-oauthlib==1.3.0
>> retrying==1.3.3
>> rsa==4.0
>> s3transfer==0.3.2
>> scikit-image==0.15.0
>> scikit-learn==0.19.2
>> scipy==1.1.0
>> seaborn==0.9.1
>> SecretStorage==2.3.1
>> Send2Trash==1.5.0
>> simplegeneric==0.8.1
>> six==1.14.0
>> smmap2==2.0.5
>> snowflake-connector-python==2.2.0
>> SQLAlchemy==1.3.13
>> sqlparse==0.3.0
>> tbb==2019.0
>> tbb4py==2019.0
>> tenacity==6.0.0
>> terminado==0.8.3
>> testpath==0.4.4
>> textwrap3==0.9.2
>> tornado==5.1.1
>> tqdm==4.42.0
>> traitlets==4.3.3
>> typed-ast==1.4.1
>> typing==3.7.4.1
>> typing-extensions==3.7.4.1
>> unattended-upgrades==0.1
>> uritemplate==3.0.1
>> urllib3==1.24.2
>> virtualenv==16.7.9
>> wcwidth==0.1.8
>> webencodings==0.5.1
>> websocket-client==0.57.0
>> Werkzeug==0.16.1
>> whichcraft==0.6.1
>> widgetsnbextension==3.5.1
>> wrapt==1.11.2
>> zipp==1.1.0
>>
>>
>> On Tue, Feb 4, 2020 at 11:33 AM Valentyn Tymofieiev <valen...@google.com>
>> wrote:
>>
>>> It don't think there is a mismatch between dill versions here, but
>>> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
>>>  mentions
>>> a similar error and may be related. What is the output of pip freeze on
>>> your machine (or better: pip install pipdeptree; pipdeptree)?
>>>
>>>
>>> On Tue, Feb 4, 2020 at 11:22 AM Alan Krumholz <alan.krumh...@betterup.co>
>>> wrote:
>>>
>>>> Here is a test job that sometimes fails and sometimes doesn't (but most
>>>> times do).....
>>>> There seems to be something stochastic that causes this as after
>>>> several tests a couple of them did succeed....
>>>>
>>>>
>>>> def test_error(
>>>>     bq_table: str) -> str:
>>>>
>>>>     import apache_beam as beam
>>>>     from apache_beam.options.pipeline_options import PipelineOptions
>>>>
>>>>     class GenData(beam.DoFn):
>>>>         def process(self, _):
>>>>             for _ in range (20000):
>>>>                 yield {'a':1,'b':2}
>>>>
>>>>
>>>>     def get_bigquery_schema():
>>>>         from apache_beam.io.gcp.internal.clients import bigquery
>>>>
>>>>         table_schema = bigquery.TableSchema()
>>>>         columns = [
>>>>             ["a","integer","nullable"],
>>>>             ["b","integer","nullable"]
>>>>         ]
>>>>
>>>>         for column in columns:
>>>>             column_schema = bigquery.TableFieldSchema()
>>>>             column_schema.name = column[0]
>>>>             column_schema.type = column[1]
>>>>             column_schema.mode = column[2]
>>>>             table_schema.fields.append(column_schema)
>>>>
>>>>         return table_schema
>>>>
>>>>     pipeline = beam.Pipeline(options=PipelineOptions(
>>>>         project='my-project',
>>>>         temp_location = 'gs://my-bucket/temp',
>>>>         staging_location = 'gs://my-bucket/staging',
>>>>         runner='DataflowRunner'
>>>>     ))
>>>>     #pipeline = beam.Pipeline()
>>>>
>>>>     (
>>>>         pipeline
>>>>         | 'Empty start' >> beam.Create([''])
>>>>         | 'Generate Data' >> beam.ParDo(GenData())
>>>>         #| 'print' >> beam.Map(print)
>>>>         | 'Write to BigQuery' >> beam.io.WriteToBigQuery(
>>>>                     project=bq_table.split(':')[0],
>>>>                     dataset=bq_table.split(':')[1].split('.')[0],
>>>>                     table=bq_table.split(':')[1].split('.')[1],
>>>>                     schema=get_bigquery_schema(),
>>>>
>>>> create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>>>>
>>>> write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)
>>>>     )
>>>>
>>>>     result = pipeline.run()
>>>>     result.wait_until_finish()
>>>>
>>>>     return True
>>>>
>>>> test_error(
>>>>     bq_table = 'my-project:my_dataset.my_table'
>>>> )
>>>>
>>>> On Tue, Feb 4, 2020 at 10:04 AM Alan Krumholz <
>>>> alan.krumh...@betterup.co> wrote:
>>>>
>>>>> I tried breaking apart my pipeline. Seems the step that breaks it is:
>>>>> beam.io.WriteToBigQuery
>>>>>
>>>>> Let me see if I can create a self contained example that breaks to
>>>>> share with you
>>>>>
>>>>> Thanks!
>>>>>
>>>>> On Tue, Feb 4, 2020 at 9:53 AM Pablo Estrada <pabl...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hm that's odd. No changes to the pipeline? Are you able to share some
>>>>>> of the code?
>>>>>>
>>>>>> +Udi Meiri <eh...@google.com> do you have any idea what could be
>>>>>> going on here?
>>>>>>
>>>>>> On Tue, Feb 4, 2020 at 9:25 AM Alan Krumholz <
>>>>>> alan.krumh...@betterup.co> wrote:
>>>>>>
>>>>>>> Hi Pablo,
>>>>>>> This is strange... it doesn't seem to be the last beam release as
>>>>>>> last night it was already using 2.19.0 I wonder if it was some release 
>>>>>>> from
>>>>>>> the DataFlow team (not beam related):
>>>>>>> Job typeBatch
>>>>>>> Job status Succeeded
>>>>>>> SDK version
>>>>>>> Apache Beam Python 3.5 SDK 2.19.0
>>>>>>> Region
>>>>>>> us-central1
>>>>>>> Start timeFebruary 3, 2020 at 9:28:35 PM GMT-8
>>>>>>> Elapsed time5 min 11 sec
>>>>>>>
>>>>>>> On Tue, Feb 4, 2020 at 9:15 AM Pablo Estrada <pabl...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Alan,
>>>>>>>> could it be that you're picking up the new Apache Beam 2.19.0
>>>>>>>> release? Could you try depending on beam 2.18.0 to see if the issue
>>>>>>>> surfaces when using the new release?
>>>>>>>>
>>>>>>>> If something was working and no longer works, it sounds like a bug.
>>>>>>>> This may have to do with how we pickle (dill / cloudpickle) - see this
>>>>>>>> question
>>>>>>>> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
>>>>>>>> Best
>>>>>>>> -P.
>>>>>>>>
>>>>>>>> On Tue, Feb 4, 2020 at 6:22 AM Alan Krumholz <
>>>>>>>> alan.krumh...@betterup.co> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I was running a dataflow job in GCP last night and it was running
>>>>>>>>> fine.
>>>>>>>>> This morning this same exact job is failing with the following
>>>>>>>>> error:
>>>>>>>>>
>>>>>>>>> Error message from worker: Traceback (most recent call last): File
>>>>>>>>> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py",
>>>>>>>>> line 286, in loads return dill.loads(s) File
>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, in 
>>>>>>>>> loads
>>>>>>>>> return load(file, ignore, **kwds) File
>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, in 
>>>>>>>>> load
>>>>>>>>> return Unpickler(file, ignore=ignore, **kwds).load() File
>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, in 
>>>>>>>>> load
>>>>>>>>> obj = StockUnpickler.load(self) File
>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, in
>>>>>>>>> _load_type return _reverse_typemap[name] KeyError: 'ClassType' During
>>>>>>>>> handling of the above exception, another exception occurred: Traceback
>>>>>>>>> (most recent call last): File
>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py",
>>>>>>>>> line 648, in do_work work_executor.execute() File
>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dataflow_worker/executor.py", 
>>>>>>>>> line
>>>>>>>>> 176, in execute op.start() File 
>>>>>>>>> "apache_beam/runners/worker/operations.py",
>>>>>>>>> line 649, in apache_beam.runners.worker.operations.DoOperation.start 
>>>>>>>>> File
>>>>>>>>> "apache_beam/runners/worker/operations.py", line 651, in
>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File
>>>>>>>>> "apache_beam/runners/worker/operations.py", line 652, in
>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File
>>>>>>>>> "apache_beam/runners/worker/operations.py", line 261, in
>>>>>>>>> apache_beam.runners.worker.operations.Operation.start File
>>>>>>>>> "apache_beam/runners/worker/operations.py", line 266, in
>>>>>>>>> apache_beam.runners.worker.operations.Operation.start File
>>>>>>>>> "apache_beam/runners/worker/operations.py", line 597, in
>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.setup File
>>>>>>>>> "apache_beam/runners/worker/operations.py", line 602, in
>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.setup File
>>>>>>>>> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py",
>>>>>>>>> line 290, in loads return dill.loads(s) File
>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, in 
>>>>>>>>> loads
>>>>>>>>> return load(file, ignore, **kwds) File
>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, in 
>>>>>>>>> load
>>>>>>>>> return Unpickler(file, ignore=ignore, **kwds).load() File
>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, in 
>>>>>>>>> load
>>>>>>>>> obj = StockUnpickler.load(self) File
>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, in
>>>>>>>>> _load_type return _reverse_typemap[name] KeyError: 'ClassType'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If I use a local runner it still runs fine.
>>>>>>>>> Anyone else experiencing something similar today? (or know how to
>>>>>>>>> fix this?)
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>

Reply via email to