raulcd commented on PR #47627:
URL: https://github.com/apache/arrow/pull/47627#issuecomment-3338192561
For the musllinux ORC issue:
> @raulcd Did you check that
`arrow::adapters::orc::ORCFileReader::Impl::ReadBatch` catches exceptions
appropriately?
No, it does not catch the exception, the problem is that a minimal ORC
reproducer seems to fail to properly catch the Exception, no Arrow involved
here, only calling ORC and trying to catch an exception, see the following
`test_exception_orc.cc` file:
```c++
#include <iostream>
#include <cstdlib>
#include "orc/OrcFile.hh"
int main() {
// Set invalid timezone directory to trigger the error
setenv("TZDIR", "/tmp/non_existent_timezone_dir", 1);
try {
std::cout << "Starting test" << std::endl;
auto inStream =
orc::readFile("/arrow/cpp/build/orc_ep-prefix/src/orc_ep/examples/TestOrcFile.testDate1900.orc");
auto reader = orc::createReader(std::move(inStream),
orc::ReaderOptions{});
auto rowReader = reader->createRowReader(orc::RowReaderOptions{});
auto batch = rowReader->createRowBatch(100);
std::cout << "Calling next() triggers TimezoneError." << std::endl;
bool hasData = rowReader->next(*batch);
std::cout << "ERROR: No exception was thrown! hasData=" << hasData
<< std::endl;
return 1;
} catch (const std::exception& e) {
std::cout << "Exception caught!" << std::endl;
std::cout << "Type: " << typeid(e).name() << std::endl;
std::cout << "Message: " << e.what() << std::endl;
return 0;
} catch (...) {
std::cout << "Unknown exception caught!" << std::endl;
return 0;
}
std::cout << "UNEXPECTED: No exception thrown at all" << std::endl;
return 1;
}
```
And the exception is not caught:
```sh
d4cb88cbc6c3:/# g++ -std=c++17 -g
-I/opt/vcpkg/installed/amd64-linux-static-release/include
test_orc_exceptions.cc
-L/opt/vcpkg/installed/amd64-linux-static-release/lib -lorc -lprotobuf
-lutf8_range -lutf8_validity -Wl,--start-group
/opt/vcpkg/installed/amd64-linux-static-release/lib/libabsl*.a
-Wl,--end-group -lz -llz4 -lzstd -lsnappy -lpthread -o
test_orc_exceptions_debug
d4cb88cbc6c3:/# ./test_orc_exceptions_debug
Starting test
Calling next() triggers TimezoneError.
terminate called after throwing an instance of 'orc::TimezoneError'
what(): Time zone file /tmp/non_existent_timezone_dir/US/Pacific does not
exist. Please install IANA time zone database and set TZDIR env.
Aborted (core dumped)
d4cb88cbc6c3:/#
```
This is using the docker container on musllinux with the same orc built that
we use. The version is 2.1.0, with some gdb:
```sh
d4cb88cbc6c3:/# gdb -batch -ex "set environment TZDIR=/tmp/non_existent"
-ex "break __cxa_throw" -ex "run" -ex "print (char*)$rdi" -ex
"info registers" -ex "bt" -ex "continue" -ex "quit"
./test_orc_exceptions_debug
Breakpoint 1 at 0x2e7c0
warning: Error disabling address space randomization: Operation not permitted
warning: the debug information found in
"/usr/lib/debug//lib/ld-musl-x86_64.so.1.debug" does not match
"/lib/ld-musl-x86_64.so.1" (CRC mismatch).
=== Minimal ORC Timezone Error Test ===
1. Set TZDIR to invalid path
2. Opening ORC file...
3. Creating ORC reader...
4. Creating row reader...
5. Creating batch...
6. Calling next() - this should trigger TimezoneError...
Breakpoint 1, 0x000073b1805c7170 in __cxa_throw () from
/usr/lib/libstdc++.so.6
A syntax error in expression, near `'.
rax 0x0 0
rbx 0x7ffc8ad68300 140722637800192
rcx 0x2 2
rdx 0x604166456960 105834004965728
rsi 0x60416686e0f0 105834009256176
rdi 0x73b1804c07b0 127206198871984
rbp 0x7ffc8ad68140 0x7ffc8ad68140
rsp 0x7ffc8ad680b8 0x7ffc8ad680b8
r8 0x62 98
r9 0x5 5
r10 0x2 2
r11 0x1ec 492
r12 0x73b180791bf0 127206201826288
r13 0x73b1804c07b0 127206198871984
r14 0x7ffc8ad68110 140722637799696
r15 0x7ffc8ad68180 140722637799808
rip 0x73b1805c7170 0x73b1805c7170 <__cxa_throw>
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
k0 0x0 0
k1 0x0 0
k2 0x0 0
k3 0x0 0
k4 0x0 0
k5 0x0 0
k6 0x0 0
k7 0x0 0
fs_base 0x73b180844b80 127206202559360
gs_base 0x0 0
#0 0x000073b1805c7170 in __cxa_throw () from /usr/lib/libstdc++.so.6
#1 0x000060416645df3e in orc::loadTZDB(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&) ()
#2 0x000060416645e743 in
std::once_flag::_Prepare_execution::_Prepare_execution<std::call_once<orc::LazyTimezone::getImpl()
const::{lambda()#1}>(std::once_flag&, orc::LazyTimezone::getImpl()
const::{lambda()#1}&&)::{lambda()#1}>(orc::LazyTimezone::getImpl()
const::{lambda()#1}&)::{lambda()#1}::_FUN() ()
#3 0x000073b1807ff7dd in ?? () from /lib/ld-musl-x86_64.so.1
#4 0x000073b1807ff776 in pthread_mutexattr_settype () from
/lib/ld-musl-x86_64.so.1
#5 0x00006041668822c0 in ?? ()
#6 0x0000000000000000 in ?? ()
terminate called after throwing an instance of 'orc::TimezoneError'
what(): Time zone file /tmp/non_existent_timezone_dir/US/Pacific does not
exist. Please install IANA time zone database and set TZDIR env.
Program received signal SIGABRT, Aborted.
0x000073b1807ef6d6 in setjmp () from /lib/ld-musl-x86_64.so.1
A debugging session is active.
Inferior 1 [process 98908] will be killed.
Quit anyway? (y or n) [answered Y; input not from terminal]
```
This comes from the `std::call_once` on ORC here:
https://github.com/apache/orc/blob/2cb13946b71140be08b54111ff36fc17da5f09af/c%2B%2B/src/Timezone.cc#L694-L706
What I am surprised is that I am unable to catch an Exception no matter what
I try :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]