Akash Patel created BEAM-3403:
---------------------------------

             Summary: Ingesting json file ValidationError: Expected type <type 
'unicode'>
                 Key: BEAM-3403
                 URL: https://issues.apache.org/jira/browse/BEAM-3403
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
    Affects Versions: 2.2.0
            Reporter: Akash Patel
            Assignee: Ahmet Altay


Reading a json file from GCS file pattern using Beam Python SDK 2.2.0 in 
Dataflow yields the following warning:

{code:bash}
Retry with exponential backoff: waiting for 4.21317187833 seconds before 
retrying report_completion_status because we caught exception: ValidationError: 
Expected type <type 'unicode'> for field name, found 
s05-s34-reify20-process-msecs (type <class 
'apache_beam.utils.counters.CounterName'>) Traceback for above exception (most 
recent call last): File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 175, 
in wrapper return fun(*args, **kwargs) File 
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
491, in report_completion_status exception_details=exception_details) File 
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
299, in report_status work_executor=self._work_executor) File 
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py", 
line 316, in report_status append_counter(work_item_status, counter, 
tentative=not completed) File 
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py", 
line 43, in append_counter status_object, counter.name, kind, 
counter.accumulator, setter) File 
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py", 
line 95, in append_counter_update add_unstructured_name_and_kind(metric_update, 
metric_name, kind) File 
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py", 
line 63, in add_unstructured_name_and_kind metric_update.nameAndKind.name = 
metric_name File 
"/usr/local/lib/python2.7/dist-packages/apitools/base/protorpclite/messages.py",
 line 973, in __setattr__ object.__setattr__(self, name, value) File 
"/usr/local/lib/python2.7/dist-packages/apitools/base/protorpclite/messages.py",
 line 1299, in __set__ value = self.validate(value) File 
"/usr/local/lib/python2.7/dist-packages/apitools/base/protorpclite/messages.py",
 line 1406, in validate return self.__validate(value, self.validate_element) 
File 
"/usr/local/lib/python2.7/dist-packages/apitools/base/protorpclite/messages.py",
 line 1364, in __validate return validate_element(value) File 
"/usr/local/lib/python2.7/dist-packages/apitools/base/protorpclite/messages.py",
 line 1549, in validate_element return super(StringField, 
self).validate_element(value) File 
"/usr/local/lib/python2.7/dist-packages/apitools/base/protorpclite/messages.py",
 line 1346, in validate_element (self.type, name, value, type(value)))
{code}

The job does not fail but rather gets stuck on trying to read the file. The 
above warning is thrown for every retry read.

However running the job with Beam Python SDK 2.1.1 works perfectly fine.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to