cambyzju opened a new pull request, #30738:
URL: https://github.com/apache/doris/pull/30738
## Proposed changes
While read orc data through multi-catalog, if column type in meta is string,
but the real data type is INT or Timestamp or others, it may make be crash:
```
(gdb) bt
#0 0x00007f73c30c2387 in raise () from /lib64/libc.so.6
#1 0x00007f73c30c3a78 in abort () from /lib64/libc.so.6
#2 0x000056552fad78b9 in ?? ()
#3 0x000056552faccecd in google::LogMessage::Fail() ()
#4 0x000056552facf409 in google::LogMessage::SendToLog() ()
#5 0x000056552facca36 in google::LogMessage::Flush() ()
#6 0x000056552facfa79 in google::LogMessageFatal::~LogMessageFatal() ()
#7 0x000056552b76b115 in
doris::vectorized::ColumnString::check_chars_length (element_number=<optimized
out>, total_length=<optimized out>,
this=<optimized out>) at
/data/doris-1.x/be/src/vec/columns/column_string.h:66
#8 doris::vectorized::ColumnString::insert_many_strings (this=<optimized
out>, strings=<optimized out>, num=4064)
at /data/doris-1.x/be/src/vec/columns/column_string.h:270
#9 0x000056552f66548c in
doris::vectorized::OrcReader::_decode_string_column
(this=this@entry=0x7f726d3b6a00, col_name=..., data_column=...,
type_kind=@0x7f67b47bc0e0: orc::TIMESTAMP, cvb=cvb@entry=0x7f6d98aadd60,
num_values=4064)
at /data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:649
#10 0x000056552f670300 in
doris::vectorized::OrcReader::_orc_column_to_doris_column
(this=this@entry=0x7f726d3b6a00, col_name=..., doris_column=...,
data_type=..., orc_column_type=0x7f6d96907440, cvb=0x7f6d98aadd60,
num_values=4064) at
/data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:738
#11 0x000056552f672d50 in doris::vectorized::OrcReader::get_next_block
(this=0x7f726d3b6a00, block=0x7f6d7fe59b20, read_rows=0x7f67b47bc318,
eof=<optimized out>) at
/data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:788
#12 0x000056552f633ad8 in doris::vectorized::VFileScanner::_get_block_impl
(this=0x7f6f051b3a00, state=<optimized out>, block=0x7f6d7fe59b20,
eof=0x7f67b47bc539) at
/data/doris-1.x/be/src/vec/exec/scan/vfile_scanner.cpp:155
#13 0x000056552f5ff329 in doris::vectorized::VScanner::get_block
(this=this@entry=0x7f6f051b3a00, state=state@entry=0x7f6d70a61900,
block=block@entry=0x7f6d7fe59b20, eof=eof@entry=0x7f67b47bc539) at
/data/doris-1.x/be/src/vec/exec/scan/vscanner.cpp:54
#14 0x000056552f5fc682 in doris::vectorized::ScannerScheduler::_scanner_scan
(this=<optimized out>, scheduler=<optimized out>, ctx=0x7f6f007bdc00,
scanner=0x7f6f051b3a00) at
/data/doris-1.x/be/src/vec/exec/scan/scanner_scheduler.cpp:247
#15 0x000056552a903b15 in std::function<void ()>::operator()() const
(this=<optimized out>) at
/var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560
#16 doris::FunctionRunnable::run (this=<optimized out>) at
/data/doris-1.x/be/src/util/threadpool.cpp:46
#17 doris::ThreadPool::dispatch_thread (this=0x7f739ad6b180) at
/data/doris-1.x/be/src/util/threadpool.cpp:535
#18 0x000056552a8f8f6f in std::function<void ()>::operator()() const
(this=0x7f7267d7da38) at
/var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560
#19 doris::Thread::supervise_thread (arg=0x7f7267d7da20) at
/data/doris-1.x/be/src/util/thread.cpp:454
#20 0x00007f73c2e77ea5 in start_thread () from /lib64/libpthread.so.0
#21 0x00007f73c318a9fd in ioperm () from /lib64/libc.so.6
#22 0x0000000000000000 in ?? ()
```
Here we expect `orc::StringVectorBatch*`, but the real type is
`orc::TimestampVectorBatch*`:
<img width="1167" alt="image"
src="https://github.com/apache/doris/assets/10771715/ffbfc127-f1d8-4a42-ada3-18755d8a8aad">
In the code, we already try to check the cast result:
```
auto* data = down_cast<orc::StringVectorBatch*>(cvb);
if (data == nullptr) {
return Status::InternalError("Wrong data type for colum '{}'",
col_name);
}
```
Unfortunately `down_cast` equals to `static_cast`, because assert do not
work in release build type.
```
template <typename To, typename From> // use like this: down_cast<T*>(foo);
inline To down_cast(From* f) { // so we only accept pointers
// Ensures that To is a sub-type of From *. This test is here only
// for compile-time type checking, and has no overhead in an
// optimized build at run-time, as it will be optimized away
// completely.
// TODO(user): This should use COMPILE_ASSERT.
if (false) {
::implicit_cast<From*, To>(NULL);
}
// uses RTTI in dbg and fastbuild. asserts are disabled in opt builds.
assert(f == NULL || dynamic_cast<To>(f) != NULL);
return static_cast<To>(f);
}
```
While we build release version, we define `NDEBUG` to turn off assert
```
# For CMAKE_BUILD_TYPE=Release
# -O3: Enable all compiler optimizations
# -DNDEBUG: Turn off dchecks/asserts/debug only code.
set(CXX_FLAGS_RELEASE "${CXX_GCC_FLAGS} -O3 -DNDEBUG")
```
## Further comments
If this is a relatively large or complex change, kick off the discussion at
[[email protected]](mailto:[email protected]) by explaining why you
chose the solution you did and what alternatives you considered, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]