[ https://issues.apache.org/jira/browse/IMPALA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong updated IMPALA-733: --------------------------------- Labels: supportability (was: ) > Improve Parquet error handling for low disk space > ------------------------------------------------- > > Key: IMPALA-733 > URL: https://issues.apache.org/jira/browse/IMPALA-733 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 1.2.3 > Environment: Less than 1GB free on the filesystem where HDFS resides. > Reporter: John Russell > Priority: Minor > Labels: supportability > > If HDFS has less than 1 GB free (or I presume whatever value is set in the > PARQUET_FILE_SIZE query option), INSERT into a Parquet table fails even for > tiny amounts of data. That might be unavoidable, but the error should be > communicated more clearly to the user. > INSERT ... VALUES reports that N rows were inserted (no error at all), but > the expected data is missing when the table is queried. > INSERT ... SELECT gives a cryptic error message but still reports that the > rows were inserted, although they aren't. > Repro: > About 400MB free. (This is a VM that keeps getting filled up by > Impala-related logs.) > $ df -k . > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/vda1 24607156 23961976 395184 99% / > I was going to answer a question on the mailing list by showing an INSERT > going from an unpartitioned to a partitioned table. > [localhost:21000] > create table unpart (year int, s string) stored as > parquet; > Query: create table unpart (year int, s string) stored as parquet > Returned 0 row(s) in 0.12s > INSERT ... VALUES looks like it succeeds, but the data isn't really there. > [localhost:21000] > insert into unpart values (2013,'Happy'),(2014,'New > Year'); > Query: insert into unpart values (2013,'Happy'),(2014,'New Year') > Inserted 2 rows in 0.22s > [localhost:21000] > select * from unpart; > Query: select * from unpart > Returned 0 row(s) in 0.22s > [localhost:21000] > select * from unpart; > Query: select * from unpart > Returned 0 row(s) in 0.22s > Copying the data out of a text table, the error is reported but it doesn't > say specifically "out of space". And the "Inserted 2 rows" message raises the > hope the data made it in, but it didn't. > [localhost:21000] > insert into unpart select * from t1; > Query: insert into unpart select * from t1 > ERRORS ENCOUNTERED DURING EXECUTION: Backend 0:Failed to close HDFS file: > hdfs://127.0.0.1:8020/user/hive/warehouse/partitioning.db/unpart/.impala_insert_staging/284cf98f761aec95_5712ef093b357195//.2903970254304242837-6274340053807624598_1840160694_dir/2903970254304242837-6274340053807624598_1083629803_data.0 > Error(255): Unknown error 255 > Inserted 2 rows in 0.34s > [localhost:21000] > select * from unpart; > Query: select * from unpart > Returned 0 row(s) in 0.22s > After all this, the data directory contains a leftover staging subdirectory > (empty) and a zero-byte data file: > $ hdfs dfs -ls > hdfs://127.0.0.1:8020/user/hive/warehouse/partitioning.db/unpart > Found 2 items > drwxrwxrwx - impala supergroup 0 2014-01-08 11:39 > hdfs://127.0.0.1:8020/user/hive/warehouse/partitioning.db/unpart/.impala_insert_staging > -rw-r--r-- 1 impala supergroup 0 2014-01-08 11:39 > hdfs://127.0.0.1:8020/user/hive/warehouse/partitioning.db/unpart/3188829493227009611-3605612775229973420_1967882694_data.0 > Suggestions: > - Make INSERT ... VALUES detect/report the HDFS error trying to write the > block. Don't report number of rows inserted. > - Make INSERT ... SELECT error clearer, either suggest it could be > out-of-space or do some followup check for $(PARQUET_FILE_SIZE) space free. > Don't report number of rows inserted. > - Be cleaner about leftover staging directories and empty data files. > (Shouldn't the data file stay in the staging directory until it's > successfully closed?) > - Whatever distributed is checking is needed so the error is handled if it's > a remote node that runs out of space, rather than the coordinator node like > in this case with a single VM. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org