I can reproduce this in Ubuntu on ec2: https://issues.cloudera.org/browse/IMPALA-4433
On Sun, Nov 6, 2016 at 7:24 PM, Tim Armstrong <[email protected]> wrote: > Hi Amos, > They should all succeed in principle, it can be finicky though - they pass > reliably in our automation environment. > > With the test_hdfs_caching.py ones, probably some part of the data loading > failed, specifically this part that caches the tables: > https://github.com/apache/incubator-impala/blob/master/testdata/bin/create-load-data.sh#L165 > . You could try running those statements by hand. > > The stats mismatch is more mysterious to me - maybe someone else has some > ideas. > > On Sun, Nov 6, 2016 at 6:27 PM, Amos Bird <[email protected]> wrote: > >> >> Are impala's e2e tests supposed to be all successful? I still get these >> 7 errors: >> >> query_test/test_hdfs_caching.py::TestHdfsCaching::test_table_is_cached[exec_option: >> {'disable_codegen': False, 'abort_on_error': 1, >> 'exec_single_node_rows_threshold': >> 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] FAILED >> query_test/test_hdfs_caching.py::TestHdfsCaching::test_table_is_cached[exec_option: >> {'disable_codegen': False, 'abort_on_error': 1, >> 'exec_single_node_rows_threshold': >> 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/gzip/block] FAILED >> [gw2] FAILED >> metadata/test_compute_stats.py::TestComputeStats::test_compute_stats[exec_option: >> {'disable_codegen': False, 'abort_on_error': 1, >> 'exec_single_node_rows_threshold': >> 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] >> [gw2] FAILED metadata/test_compute_stats.py::TestComputeStats::test_ >> compute_stats_incremental[exec_option: {'disable_codegen': False, >> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': >> 0, 'num_nodes': 0} | table_format: text/none] >> [gw6] FAILED >> metadata/test_ddl.py::TestDdlStatements::test_alter_set_column_stats[exec_option: >> {'batch_size': 0, 'num_nodes': 0, 'sync_ddl': 0, 'disable_codegen': False, >> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | >> table_format: text/none-unique_database0] >> [gw6] FAILED >> metadata/test_ddl.py::TestDdlStatements::test_truncate_table[exec_option: >> {'batch_size': 0, 'num_nodes': 0, 'sync_ddl': 0, 'disable_codegen': False, >> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | >> table_format: text/none-unique_database0] >> [gw6] FAILED metadata/test_metadata_query_statements.py:: >> TestMetadataQueryStatements::test_show_stats[exec_option: {'batch_size': >> 0, 'num_nodes': 0, 'sync_ddl': 0, 'disable_codegen': False, >> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | >> table_format: text/none] >> >> and this stats mismatch looks exactly on my Centos machine, >> --- >> -- executing against localhost:21000 >> show column stats alltypes_clone; >> >> MainThread: Comparing QueryTestResults (expected vs actual): >> 'bigint_col','BIGINT',10,-1,8,8 == 'bigint_col','BIGINT',10,-1,8,8 >> 'bool_col','BOOLEAN',2,-1,1,1 == 'bool_col','BOOLEAN',2,-1,1,1 >> 'date_string_col','STRING',736,-1,8,8 == 'date_string_col','STRING', >> 736,-1,8,8 >> 'double_col','DOUBLE',-1,-1,8,8 == 'double_col','DOUBLE',-1,-1,8,8 >> 'float_col','FLOAT',10,-1,4,4 == 'float_col','FLOAT',10,-1,4,4 >> 'id','INT',7505,-1,4,4 == 'id','INT',7505,-1,4,4 >> 'int_col','INT',-1,-1,4,4 == 'int_col','INT',-1,-1,4,4 >> 'month','INT',12,0,4,4 == 'month','INT',12,0,4,4 >> 'smallint_col','SMALLINT',10,-1,2,2 == 'smallint_col','SMALLINT',10,-1,2,2 >> 'string_col','STRING',10,-1,-1,-1 == 'string_col','STRING',10,-1,-1,-1 >> 'timestamp_col','TIMESTAMP',7554,-1,16,16 != 'timestamp_col','TIMESTAMP', >> 7552,-1,16,16 >> 'tinyint_col','TINYINT',10,-1,1,1 == 'tinyint_col','TINYINT',10,-1,1,1 >> 'year','INT',2,0,4,4 == 'year','INT',2,0,4,4 >> --- >> >> Amos Bird writes: >> >> > Ah, re-login does the trick. Thanks for you help ;). >> > >> > However, the e2e test yells so many errors. >> > >> > 1) the name of the directory containing the error log is strange. It >> > literaly looks like this: >> > tests/"${RESULTS_DIR}/TEST-impala-custom-cluster.log" >> > >> > 2) the commit I tested is 7fc31b534d4c5cb118c559e16556a6c1ae6ca7fc >> > >> > 3) when executing tests/run-tests.py, it gave: >> > ----- >> > Traceback (most recent call last): >> > File "./tests/run-tests.py", line 94, in <module> >> > test_executor.run_tests(args) >> > File "./tests/run-tests.py", line 63, in run_tests >> > exit_code = pytest.main(args) >> > File "/home/amos/impala/infra/python/env/local/lib/python2. >> 7/site-packages/_pytest/config.py", line 32, in main >> > config = _prepareconfig(args, plugins) >> > File "/home/amos/impala/infra/python/env/local/lib/python2. >> 7/site-packages/_pytest/config.py", line 78, in _prepareconfig >> > args = shlex.split(args) >> > File "/usr/lib/python2.7/shlex.py", line 279, in split >> > return list(lex) >> > File "/usr/lib/python2.7/shlex.py", line 269, in next >> > token = self.get_token() >> > File "/usr/lib/python2.7/shlex.py", line 96, in get_token >> > raw = self.read_token() >> > File "/usr/lib/python2.7/shlex.py", line 172, in read_token >> > raise ValueError, "No closing quotation" >> > ValueError: No closing quotation >> > ----- >> > >> > 4) when executing "MAX_PYTEST_FAILURES=12345678 ./bin/run-all-tests.sh", >> > be, fe tests are passed. e2e tests fail a lot. Log files are attached. >> > >> > I'm refering to this https://cwiki.apache.org/ >> confluence/display/IMPALA/How+to+load+and+run+Impala+tests >> > >> > regards, >> > Amos >> > >> > >> > Lars Volker writes: >> > >> >> Yes, this is already committed to the impala-setup repo and I used it >> >> yesterday on a fresh Ubuntu 14.04 machine with success. >> >> >> >> Amos, after running impala-setup you will need to re-login to make sure >> the >> >> changes made to the system limits are effective. You can check them by >> >> running "ulimit -n" in your shell. >> >> >> >> On Wed, Nov 2, 2016 at 5:48 AM, Jim Apple <[email protected]> wrote: >> >> >> >>> Isn't that already part of the script? >> >>> >> >>> https://github.com/awleblang/impala-setup/commit/ >> >>> 56fa829c99e997585eb63fcd49cb65eb8357e679 >> >>> >> >>> https://git-wip-us.apache.org/repos/asf?p=incubator-impala. >> >>> git;a=blob;f=bin/bootstrap_development.sh;h= >> 8c4f742ae058f8017858d2a749e882 >> >>> 4be58bd410;hb=HEAD#l68 >> >>> >> >>> On Tue, Nov 1, 2016 at 9:44 PM, Dimitris Tsirogiannis >> >>> <[email protected]> wrote: >> >>> > Hi Amos, >> >>> > >> >>> > You need to increase your limits (/etc/security/limits.conf) for max >> >>> > number of open files (nofile). Use a pretty big number (e.g. 500K) >> for >> >>> > both soft and hard. >> >>> > >> >>> > Hope that helps. >> >>> > >> >>> > Dimitris >> >>> > >> >>> > On Tue, Nov 1, 2016 at 8:57 PM, Amos Bird <[email protected]> >> wrote: >> >>> >> >> >>> >> Hi there, >> >>> >> >> >>> >> After days of efforts to make impala's local tests work on my Centos >> >>> >> machine, I finally gave up and turns to Ubuntu. I followed this >> simple >> >>> >> guide >> >>> >> https://cwiki.apache.org/confluence/display/IMPALA/ >> >>> Bootstrapping+an+Impala+Development+Environment+From+Scratch >> >>> >> on a fresh installed Ubuntu 14.04. Unfortunately there are still >> errors >> >>> >> in loading data phase. Here is the error log, >> >>> >> >> >>> >> ------------------------------------------------------------ >> >>> --------------------------------- >> >>> >> Loading Kudu TPCH (logging to /home/amos/impala/logs/data_ >> loading/load-kudu-tpch.log)... >> >>> FAILED >> >>> >> 'load-data tpch core kudu/none/none force' failed. Tail of log: >> >>> >> distribute by hash (c_custkey) into 9 buckets stored as kudu >> >>> >> >> >>> >> (load-tpch-core-impala-generated-kudu-none-none.sql): >> >>> >> >> >>> >> >> >>> >> Executing HBase Command: hbase shell load-tpch-core-hbase- >> >>> generated.create >> >>> >> 16/11/02 01:07:58 INFO Configuration.deprecation: hadoop.native.lib >> is >> >>> deprecated. Instead, use io.native.lib.available >> >>> >> SLF4J: Class path contains multiple SLF4J bindings. >> >>> >> SLF4J: Found binding in [jar:file:/home/amos/impala/ >> >>> toolchain/cdh_components/hbase-1.2.0-cdh5.10.0- >> >>> SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/ >> >>> StaticLoggerBinder.class] >> >>> >> SLF4J: Found binding in [jar:file:/home/amos/impala/ >> >>> toolchain/cdh_components/hadoop-2.6.0-cdh5.10.0- >> >>> SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/ >> org/slf4j/impl/ >> >>> StaticLoggerBinder.class] >> >>> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >> >>> explanation. >> >>> >> SLF4J: Actual binding is of type [org.slf4j.impl. >> Log4jLoggerFactory] >> >>> >> Executing HBase Command: hbase shell post-load-tpch-core-hbase- >> >>> generated.sql >> >>> >> 16/11/02 01:08:03 INFO Configuration.deprecation: hadoop.native.lib >> is >> >>> deprecated. Instead, use io.native.lib.available >> >>> >> SLF4J: Class path contains multiple SLF4J bindings. >> >>> >> SLF4J: Found binding in [jar:file:/home/amos/impala/ >> >>> toolchain/cdh_components/hbase-1.2.0-cdh5.10.0- >> >>> SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/ >> >>> StaticLoggerBinder.class] >> >>> >> SLF4J: Found binding in [jar:file:/home/amos/impala/ >> >>> toolchain/cdh_components/hadoop-2.6.0-cdh5.10.0- >> >>> SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/ >> org/slf4j/impl/ >> >>> StaticLoggerBinder.class] >> >>> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >> >>> explanation. >> >>> >> SLF4J: Actual binding is of type [org.slf4j.impl. >> Log4jLoggerFactory] >> >>> >> Invalidating Metadata >> >>> >> (load-tpch-core-impala-load-generated-kudu-none-none.sql): >> >>> >> INSERT INTO TABLE tpch_kudu.lineitem SELECT * FROM tpch.lineitem >> >>> >> >> >>> >> Data Loading from Impala failed with error: ImpalaBeeswaxException: >> >>> >> Query aborted: >> >>> >> Kudu error(s) reported, first error: Timed out: Failed to write >> batch >> >>> of 2708 ops to tablet 84aa134fb6c24916aa16cf50f48ec557 after 329 >> >>> attempt(s): Failed to write to server: (no server available): >> Write(tablet: >> >>> 84aa134fb6c24916aa16cf50f48ec557, num_ops: 2708, num_attempts: 329) >> >>> passed its deadline: Network error: recv error: Connection reset by >> peer >> >>> (error 104) >> >>> >> >> >>> >> >> >>> >> >> >>> >> Kudu error(s) reported, first error: Timed out: Failed to write >> batch >> >>> of 2708 ops to tablet 84aa134fb6c24916aa16cf50f48ec557 after 329 >> >>> attempt(s): Failed to write to server: (no server available): >> Write(tablet: >> >>> 84aa134fb6c24916aa16cf50f48ec557, num_ops: 2708, num_attempts: 329) >> >>> passed its deadline: Network error: recv error: Connection reset by >> peer >> >>> (error 104) >> >>> >> Error in Kudu table 'impala::tpch_kudu.lineitem': Timed out: Failed >> to >> >>> write batch of 2708 ops to tablet 84aa134fb6c24916aa16cf50f48ec557 >> after >> >>> 329 attempt(s): Failed to write to server: (no server available): >> >>> Write(tablet: 84aa134fb6c24916aa16cf50f48ec557, num_ops: 2708, >> >>> num_attempts: 329) passed its deadline: Network error: recv error: >> >>> Connection reset by peer (error 104) (1 of 2708 similar) >> >>> >> >> >>> >> Traceback (most recent call last): >> >>> >> File "/home/amos/impala/bin/load-data.py", line 158, in >> >>> exec_impala_query_from_file >> >>> >> result = impala_client.execute(query) >> >>> >> File "/home/amos/impala/tests/beeswax/impala_beeswax.py", line >> 173, >> >>> in execute >> >>> >> handle = self.__execute_query(query_string.strip(), user=user) >> >>> >> File "/home/amos/impala/tests/beeswax/impala_beeswax.py", line >> 339, >> >>> in __execute_query >> >>> >> self.wait_for_completion(handle) >> >>> >> File "/home/amos/impala/tests/beeswax/impala_beeswax.py", line >> 359, >> >>> in wait_for_completion >> >>> >> raise ImpalaBeeswaxException("Query aborted:" + error_log, None) >> >>> >> ImpalaBeeswaxException: ImpalaBeeswaxException: >> >>> >> Query aborted: >> >>> >> Kudu error(s) reported, first error: Timed out: Failed to write >> batch >> >>> of 2708 ops to tablet 84aa134fb6c24916aa16cf50f48ec557 after 329 >> >>> attempt(s): Failed to write to server: (no server available): >> Write(tablet: >> >>> 84aa134fb6c24916aa16cf50f48ec557, num_ops: 2708, num_attempts: 329) >> >>> passed its deadline: Network error: recv error: Connection reset by >> peer >> >>> (error 104) >> >>> >> >> >>> >> >> >>> >> >> >>> >> Kudu error(s) reported, first error: Timed out: Failed to write >> batch >> >>> of 2708 ops to tablet 84aa134fb6c24916aa16cf50f48ec557 after 329 >> >>> attempt(s): Failed to write to server: (no server available): >> Write(tablet: >> >>> 84aa134fb6c24916aa16cf50f48ec557, num_ops: 2708, num_attempts: 329) >> >>> passed its deadline: Network error: recv error: Connection reset by >> peer >> >>> (error 104) >> >>> >> Error in Kudu table 'impala::tpch_kudu.lineitem': Timed out: Failed >> to >> >>> write batch of 2708 ops to tablet 84aa134fb6c24916aa16cf50f48ec557 >> after >> >>> 329 attempt(s): Failed to write to server: (no server available): >> >>> Write(tablet: 84aa134fb6c24916aa16cf50f48ec557, num_ops: 2708, >> >>> num_attempts: 329) passed its deadline: Network error: recv error: >> >>> Connection reset by peer (error 104) (1 of 2708 similar) >> >>> >> >> >>> >> Error in /home/amos/impala/testdata/bin/create-load-data.sh at line >> >>> 45: while [ -n "$*" ] >> >>> >> + cleanup >> >>> >> + rm -rf /tmp/tmp.hMzGwIcUo3 >> >>> >> ------------------------------------------------------------ >> >>> --------------------------------- >> >>> >> >> >>> >> This kinda blocks my patch's rebasing. Any help is much appreciated! >> >>> >> >> >>> >> regards, >> >>> >> Amos >> >>> >> >>
