[ 
https://issues.apache.org/jira/browse/BEAM-11497?focusedWorklogId=536616&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-536616
 ]

ASF GitHub Bot logged work on BEAM-11497:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Jan/21 19:52
            Start Date: 15/Jan/21 19:52
    Worklog Time Spent: 10m 
      Work Description: chamikaramj commented on a change in pull request 
#13734:
URL: https://github.com/apache/beam/pull/13734#discussion_r558559023



##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -803,7 +803,7 @@ def split(self, desired_bundle_size, start_position=None, 
stop_position=None):
         bq.clean_up_temporary_dataset(self._get_project())
 
     for source in self.split_result:
-      yield SourceBundle(0, source, None, None)
+      yield SourceBundle(1.0, source, None, None)

Review comment:
       Can we use size of files to provide a better estimate of the weight ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 536616)
    Time Spent: 1h 10m  (was: 1h)

> Division By Zero errors being thrown by new BQ Source
> -----------------------------------------------------
>
>                 Key: BEAM-11497
>                 URL: https://issues.apache.org/jira/browse/BEAM-11497
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-py-gcp, runner-dataflow
>            Reporter: Pablo Estrada
>            Assignee: Pablo Estrada
>            Priority: P2
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Users are reporting issues with the BQ Source, where division by zero errors 
> are being reported by Beam. A stack trace:
> {code:java}
> Log: [1] An exception was raised when trying to execute the workitem 
> 2053702133423558223 : Traceback (most recent call last): File 
> "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 
> 649, in do_work work_executor.execute() File 
> "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 
> 179, in execute op.start() File "dataflow_worker/native_operations.py", line 
> 38, in dataflow_worker.native_operations.NativeReadOperation.start File 
> "dataflow_worker/native_operations.py", line 39, in 
> dataflow_worker.native_operations.NativeReadOperation.start File 
> "dataflow_worker/native_operations.py", line 44, in 
> dataflow_worker.native_operations.NativeReadOperation.start File 
> "dataflow_worker/native_operations.py", line 48, in 
> dataflow_worker.native_operations.NativeReadOperation.start File 
> "/usr/local/lib/python3.7/site-packages/dataflow_worker/workercustomsources.py",
>  line 69, in iter self._source.start_position, self._source.stop_position) 
> File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py", 
> line 78, in get_range_tracker start_position, stop_position, 
> self._source_bundles) File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py", 
> line 131, in init 
> self._compute_cumulative_weights(source_bundles[start[0]:last]) + [1] * File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py", 
> line 154, in _compute_cumulative_weights running_total.append(max(min_diff, 
> min(1, running_total[-1] + w / total))) ZeroDivisionError: flo
> {code}
>  
> I am suspecting this issue: https://issues.apache.org/jira/browse/BEAM-2716
> [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13341887]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to