Sahil Takiar created IMPALA-9112:
------------------------------------
Summary: Consider removing hdfsExists calls when writing out files
Key: IMPALA-9112
URL: https://issues.apache.org/jira/browse/IMPALA-9112
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar
There are a few places in the backend where we call {{hdfsExists}} before
writing out a file. This can cause issues when writing data to S3, because S3
can cache 404 Not Found errors. This issue manifests itself with errors such as:
{code:java}
ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op (RENAME
s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d8800000000/.3943ae7ccf00711e-59606d880000000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d880000000b_1994902389_data.0.parq
TO
s3a://[bucket-name]/[table-name]/3943ae7ccf00711e-59606d880000000b_1994902389_data.0.parq)
failed, error was:
s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d8800000000/.3943ae7ccf00711e-59606d880000000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d880000000b_1994902389_data.0.parq
Error(5): Input/output error
Root cause: AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404;
Error Code: 404 Not Found; Request ID: []; S3 Extended Request ID: []){code}
HADOOP-13884, HADOOP-13950, HADOOP-16490 - the HDFS clients allow specifying an
"overwrite" option when creating a file; this can avoid doing any HEAD requests
when opening a file.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]