[ 
https://issues.apache.org/jira/browse/IMPALA-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902310#comment-17902310
 ] 

ASF subversion and git services commented on IMPALA-13589:
----------------------------------------------------------

Commit 9cd593840fdfac5d64ddd0bd71d3942d8f872e2c in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9cd593840 ]

IMPALA-13589: SELECT INPUT__FILE__NAME can crash Impala

If the user only queries virtual column INPUT__FILE__NAME
from a table backed by text files, and the last row doesn't
end with the row delimiter (e.g. '\n') then Impala crashes.

In HdfsTextScanner::FinishScanRange() there is specific code
to deal with the last row if it doesn't end with the row
delimiter, and we fill the last tuple here. This code wasn't active
when we only read INPUT__FILE__NAME, which means the last
tuple contained garbage which caused a segfault later.

The fix is to always fill the last tuple if we have a template
tuple as it means we either have partition expressions, or
file-level virtual columns like INPUT__FILE__NAME.

Other file-level virtual columns only apply to Iceberg tables
which don't support text data files, so those are not affected
by this bug.

Testing
 * added e2e tests

Change-Id: I0ea8e7fed77cbc9ae90a858eafeee9dcfd73d143
Reviewed-on: http://gerrit.cloudera.org:8080/22141
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Gabor Kaszab <[email protected]>


> SELECT INPUT__FILE__NAME can crash Impala
> -----------------------------------------
>
>                 Key: IMPALA-13589
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13589
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>             Fix For: Impala 4.5.0
>
>
> To crash Impala we need a text file that doesn't have '\n' (newline) 
> character at the end of the file.
> h2. Repro
> In impala shell (dev environment):
> {noformat}
> create table simple_text (s string)
> stored as textfile;
> {noformat}
> In bash:
> {noformat}
> printf "A\nA\nA" >data.txt     # no '\n' in final row
> hdfs dfs -put -f data.txt /test-warehouse/simple_text
> {noformat}
> In impala shell:
> {noformat}
> select INPUT__FILE__NAME from simple_text;
> {noformat}
> h2. The stacktrace:
> {noformat}
> #0  0x00007f721a2969fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f721a242476 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007f721a2287f3 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x0000000002369e85 in google::DumpStackTraceAndExit() [clone .cold] ()
> #4  0x0000000005c38f6d in google::LogMessage::Fail() ()
> #5  0x0000000005c3ae84 in google::LogMessage::SendToLog() ()
> #6  0x0000000005c3894c in google::LogMessage::Flush() ()
> #7  0x0000000005c3b3a9 in google::LogMessageFatal::~LogMessageFatal() ()
> #8  0x000000000451b923 in 
> impala::BufferedTupleStream::DeepCopyInternal<false> (this=0x158bd8c0, 
> row=0x12794000, data=0x7f705de40b18, data_end=0x1ca0000d "") at 
> /home/boroknagyz/Impala/be/src/runtime/buffered-tuple-stream.cc:1092
> #9  0x0000000004519019 in impala::BufferedTupleStream::DeepCopy 
> (this=0x158bd8c0, row=0x12794000, data=0x7f705de40b18, data_end=0x1ca0000d 
> "") at /home/boroknagyz/Impala/be/src/runtime/buffered-tuple-stream.cc:1052
> #10 0x0000000004518438 in impala::BufferedTupleStream::AddRowSlow 
> (this=0x158bd8c0, row=0x12794000, status=0x7f705de40cd8) at 
> /home/boroknagyz/Impala/be/src/runtime/buffered-tuple-stream.cc:1003
> #11 0x0000000004518e2b in impala::BufferedTupleStream::AddRow 
> (this=0x158bd8c0, row=0x12794000, status=0x7f705de40cd8) at 
> /home/boroknagyz/Impala/be/src/runtime/buffered-tuple-stream.cc:1040
> #12 0x000000000458c8ed in impala::SpillableRowBatchQueue::AddBatch 
> (this=0x13aa0800, batch=0x16977e40) at 
> /home/boroknagyz/Impala/be/src/runtime/spillable-row-batch-queue.cc:81
> #13 0x00000000033fbd82 in impala::BufferedPlanRootSink::Send 
> (this=0x122ff680, state=0x122ff8c0, batch=0x16977e40) at 
> /home/boroknagyz/Impala/be/src/exec/buffered-plan-root-sink.cc:92
> #14 0x0000000002b0ec68 in impala::FragmentInstanceState::ExecInternal 
> (this=0x1b2ba680) at 
> /home/boroknagyz/Impala/be/src/runtime/fragment-instance-state.cc:452
> #15 0x0000000002b0accb in impala::FragmentInstanceState::Exec 
> (this=0x1b2ba680) at 
> /home/boroknagyz/Impala/be/src/runtime/fragment-instance-state.cc:104
> #16 0x0000000002a48530 in impala::QueryState::ExecFInstance (this=0x148ee400, 
> fis=0x1b2ba680) at /home/boroknagyz/Impala/be/src/runtime/query-state.cc:1013
> #17 0x0000000002a464c8 in operator() (__closure=0x11f89d88) at 
> /home/boroknagyz/Impala/be/src/runtime/query-state.cc:918
> #18 0x0000000002a4b585 in 
> boost::detail::function::void_function_obj_invoker0<impala::QueryState::StartFInstances()::<lambda()>,
>  void>::invoke(boost::detail::function::function_buffer &) 
> (function_obj_ptr=...)
>     at 
> /opt/Impala-Toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/function/function_template.hpp:158
> #19 0x00000000029d929a in boost::function0<void>::operator() 
> (this=0x11f89d80) at 
> /opt/Impala-Toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/function/function_template.hpp:763
> #20 0x00000000031b1597 in 
> impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()> const&, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=..., 
> functor=..., parent_thread_info=0x7f705e642730, thread_started=0x7f705e6407d0)
>     at /home/boroknagyz/Impala/be/src/util/thread.cc:360
> ...{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to