Will Berkeley created KUDU-2812:
-----------------------------------
Summary: TOCTOU problem with error reporting in kudu-spark and
kudu-backup
Key: KUDU-2812
URL: https://issues.apache.org/jira/browse/KUDU-2812
Project: Kudu
Issue Type: Bug
Components: backup, spark
Affects Versions: 1.9.0
Reporter: Will Berkeley
Fix For: 1.10.0
In KuduRestore.scala we have code like
{noformat}
// Fail the task if there are any errors.
val errorCount = session.getPendingErrors.getRowErrors.length
if (errorCount > 0) {
val errors =
session.getPendingErrors.getRowErrors.take(5).map(_.getErrorStatus).mkString
throw new RuntimeException(
s"failed to write $errorCount rows from DataFrame to Kudu; sample
errors: $errors")
}
{noformat}
There's similar code in KuduContext.scala:
{noformat}
val errorCount = pendingErrors.getRowErrors.length
if (errorCount > 0) {
val errors =
pendingErrors.getRowErrors.take(5).map(_.getErrorStatus).mkString
throw new RuntimeException(
s"failed to write $errorCount rows from DataFrame to Kudu; sample
errors: $errors")
}
{noformat}
I've seen the former fail to print any sample errors. Taking a reference to
{{session.getPendingErrors.getRowErrors}} and using that through fixes this, so
it seems like there's some TOCTOU problem that can occur, probably because
multiple batches can be in flight at once.
The latter is most likely vulnerable to this as well.
This issue made diagnosing KUDU-2809 harder.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)