[
https://issues.apache.org/jira/browse/IMPALA-13015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abhishek Rawat resolved IMPALA-13015.
-------------------------------------
Target Version: Impala 4.4.0
Resolution: Fixed
> Dataload fails due to concurrency issue with test.jceks
> -------------------------------------------------------
>
> Key: IMPALA-13015
> URL: https://issues.apache.org/jira/browse/IMPALA-13015
> Project: IMPALA
> Issue Type: Bug
> Components: Infrastructure
> Affects Versions: Impala 4.4.0
> Reporter: Joe McDonnell
> Assignee: Abhishek Rawat
> Priority: Major
> Labels: flaky
>
> When doing dataload locally, it fails with this error:
> {noformat}
> Traceback (most recent call last):
> File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 523, in
> <module>
> if __name__ == "__main__": main()
> File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 322, in
> main
> os.remove(jceks_path)
> OSError: [Errno 2] No such file or directory:
> '/home/joemcdonnell/upstream/Impala/testdata/jceks/test.jceks'
> Background task Loading functional-query data (pid 501094) failed.
> {noformat}
> testdata/bin/create-load-data.sh calls bin/load-data.py for functional,
> TPC-H, and TPC-DS in parallel, so this logic has race conditions:
> {noformat}
> jceks_path = TESTDATA_JCEKS_DIR + "/test.jceks"
> if os.path.exists(jceks_path):
> os.remove(jceks_path){noformat}
> I don't see a specific reason for this to be in bin/load-data.py. It should
> be moved somewhere else that doesn't run in parallel. One possible location
> is to add a step in testdata/bin/create-load-data.sh
> This was introduced in
> [https://github.com/apache/impala/commit/9837637d9342a49288a13a421d4e749818da1432]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)