[ 
https://issues.apache.org/jira/browse/BEAM-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17491183#comment-17491183
 ] 

Ning edited comment on BEAM-13376 at 2/11/22, 10:30 PM:
--------------------------------------------------------

[~wangrobert] In that case, for this ticket, it doesn't sound like a bug in the 
bigtable client. The client is just not used as intended.

In BEAM-13606, the bug of the bigtable client is very generic and could happen 
to any code that invokes row mutation, including their own status for-loop code 
snippet.

Things are wrong on bigtable client side involves:
* the client suppresses retryable errors, causing data loss because the invoker 
(Beam I/O) doesn't know which rows are failed.
* when retryable errors are suppressed, None is returned instead of Status. So 
the for-loop in the [sample 
code](https://github.com/googleapis/python-bigtable/blob/main/samples/snippets/writes/write_batch.py#L40)
 will explode because it tries to access `None.code`.
* the client's worker raises those later suppressed retryable errors. But note 
a retryable error (such as deadline_exceeded or unavailable) for a row in a 
batch pollutes other rows in that batch: other rows could be successful or 
running into non-retryable errors. But now a retryable error is raised, the 
worker try-except-breaks at that point and retries the whole batch; eventually 
the worker gives up, bubbles up the error, suppresses it and return None for 
all rows in that batch: ending in a corrupted state for the whole system.



was (Author: ningk):
[~wangrobert] In that case, for this ticket, it doesn't sound like a bug in the 
bigtable client. The client is just not used as intended.

In BEAM-13606, the bug of the bigtable client is very generic and could happen 
to any code that invokes row mutation, including their own status for-loop code 
snippet.

Things are wrong involves:
* the client suppresses retryable errors, causing data loss because the invoker 
(Beam I/O) doesn't know which rows are failed.
* when retryable errors are suppressed, None is returned instead of Status. So 
the for-loop in the [sample 
code](https://github.com/googleapis/python-bigtable/blob/main/samples/snippets/writes/write_batch.py#L40)
 will explode because it tries to access `None.code`.
* the client's worker raises those later suppressed retryable errors. But note 
a retryable error (such as deadline_exceeded or unavailable) for a row in a 
batch pollutes other rows in that batch: other rows could be successful or 
running into non-retryable errors. But now a retryable error is raised, the 
worker try-except-breaks at that point and retries the whole batch; eventually 
the worker gives up, bubbles up the error, suppresses it and return None for 
all rows in that batch: ending in a corrupted state for the whole system.


> Missing error for nonexistent column family BigTable
> ----------------------------------------------------
>
>                 Key: BEAM-13376
>                 URL: https://issues.apache.org/jira/browse/BEAM-13376
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>            Reporter: PierreOberholzer
>            Priority: P1
>
> Currently, there is no error thrown by BigTable in case the Column Families 
> are not defined at writing time. That is a misleading behavior as the user 
> believes the job has completed, though with empty table.
> A bug was raised on BigTable:
> [https://issuetracker.google.com/issues/186053077?pli=1]
> But it should be made sure that the Beam IO will log this error appropriately.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to