alihan-synnada opened a new issue, #13896: URL: https://github.com/apache/datafusion/issues/13896
### Describe the bug Attempting to download the IMDB dataset gives the following error: ``` tar: Error opening archive: Unrecognized archive format ``` An `IMDB.tgz` is created with the following content: ```html <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>404 Not Found</title> </head><body> <h1>Not Found</h1> <p>The requested URL was not found on this server.</p> </body></html> ``` It seems the dataset is removed or unavailable. ### To Reproduce Run `benchmarks/bench.sh data imdb` ### Expected behavior It should download the dataset, extract the csv files and convert to parquet. ### Additional context The related part in `bench.sh` https://github.com/apache/datafusion/blob/6cfd1cf1e030ccfe3b17621cc51fdcefcceae018/benchmarks/bench.sh#L458-L463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org