[ 
https://issues.apache.org/jira/browse/IMPALA-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893831#comment-17893831
 ] 

Quanlong Huang commented on IMPALA-13448:
-----------------------------------------

[~jiangwei] Thanks for taking this! I'm adding some notes that might help.

One way to reproduce the issue is by using a non-empty path for 
lineage_event_log_dir and restrict the permission to let impala fail to write 
lineage log files.

Create the folder:
{code:java}
mkdir -p /tmp/abc
chmod 0 /tmp/abc{code}
Start Impala with --lineage_event_log_dir=/tmp/abc
{noformat}
$ bin/start-impala-cluster.py --impalad_args=--lineage_event_log_dir=/tmp/abc
21:33:36 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
21:33:36 MainThread: Starting State Store logging to 
/home/quanlong/workspace/Impala/logs/cluster/statestored.INFO
21:33:36 MainThread: Starting Catalog Service logging to 
/home/quanlong/workspace/Impala/logs/cluster/catalogd.INFO
21:33:36 MainThread: Starting Impala Daemon logging to 
/home/quanlong/workspace/Impala/logs/cluster/impalad.INFO
21:33:36 MainThread: Starting Impala Daemon logging to 
/home/quanlong/workspace/Impala/logs/cluster/impalad_node1.INFO
21:33:36 MainThread: Starting Impala Daemon logging to 
/home/quanlong/workspace/Impala/logs/cluster/impalad_node2.INFO
21:33:39 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
21:33:39 MainThread: Waiting for Impalad webserver port 25000
21:33:39 MainThread: Waiting for Impalad webserver port 25000
21:33:40 MainThread: Waiting for Impalad webserver port 25000
21:33:41 MainThread: Waiting for Impalad webserver port 25000
21:33:41 MainThread: Error starting cluster
Traceback (most recent call last):
  File "bin/start-impala-cluster.py", line 1175, in <module>
    impala_cluster.wait_until_ready(expected_cluster_size, 
expected_num_ready_impalads)
  File "/home/quanlong/workspace/Impala/tests/common/impala_cluster.py", line 
231, in wait_until_ready
    impalad.wait_for_webserver(sleep_interval, check_processes_still_running)
  File "/home/quanlong/workspace/Impala/tests/common/impala_cluster.py", line 
615, in wait_for_webserver
    early_abort_fn()
  File "/home/quanlong/workspace/Impala/tests/common/impala_cluster.py", line 
224, in check_processes_still_running
    assert len(self.impalads) >= expected_num_impalads
AssertionError{noformat}
Checking the logs in $IMPALA_HOME/logs/cluster/impalad.INFO, it only shows 
"Could not open log file":
{code:java}
I1029 21:33:40.884531 31965 status.cc:129] Could not open log file: 
/tmp/abc/impala_lineage_log_1.0-1730208820884
    @          0x10c46f7  impala::Status::Status()
    @          0x1a74074  impala::SimpleLogger::FlushInternal()
    @          0x1a75538  impala::SimpleLogger::Init()
    @          0x16f7b07  impala::ImpalaServer::InitLineageLogging()
    @          0x171f3a1  impala::ImpalaServer::ImpalaServer()
    @          0x16ebc68  ImpaladMain()
    @           0xf6a52c  main
    @     0x7f4d036cfc87  __libc_start_main
    @           0xf6a38a  _start
E1029 21:33:40.957763 31965 impala-server.cc:511] Aborting Impala Server 
startup due to failure initializing lineage logging. Impalad exiting.{code}
The log file should show the reason, e.g. "Permission denied" when using touch:
{noformat}
$ touch /tmp/abc/file
touch: cannot touch '/tmp/abc/file': Permission denied{noformat}

> Impala should log the reason why it fails to open a log file
> ------------------------------------------------------------
>
>                 Key: IMPALA-13448
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13448
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: jiangwei
>            Priority: Critical
>              Labels: newbie, ramp-up
>
> When impala fails to flush lineage events, audit events or profiles, only the 
> log file name is logged: "Could not open log file: filename". We should also 
> log the cause for better troubleshooting.
> Here is an example to use the POSIX-compatible thread-local error number 
> variable:
> https://stackoverflow.com/a/5836222
> https://en.cppreference.com/w/cpp/header/cerrno



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to