raulcd commented on PR #47627:
URL: https://github.com/apache/arrow/pull/47627#issuecomment-3338192561

   For the musllinux ORC issue:
   
   > @raulcd Did you check that 
`arrow::adapters::orc::ORCFileReader::Impl::ReadBatch` catches exceptions 
appropriately?
   
   No, it does not catch the exception, the problem is that a minimal ORC 
reproducer seems to fail to properly catch the Exception, no Arrow involved 
here, only calling ORC and trying to catch an exception, see the following 
`test_exception_orc.cc` file:
   ```c++
   #include <iostream>
   #include <cstdlib>
   #include "orc/OrcFile.hh"
   
   int main() {    
       // Set invalid timezone directory to trigger the error
       setenv("TZDIR", "/tmp/non_existent_timezone_dir", 1);
       
       try {
           std::cout << "Starting test" << std::endl;
           auto inStream = 
orc::readFile("/arrow/cpp/build/orc_ep-prefix/src/orc_ep/examples/TestOrcFile.testDate1900.orc");
           auto reader = orc::createReader(std::move(inStream), 
orc::ReaderOptions{});
           auto rowReader = reader->createRowReader(orc::RowReaderOptions{});
           auto batch = rowReader->createRowBatch(100);
           
           std::cout << "Calling next() triggers TimezoneError." << std::endl;
           bool hasData = rowReader->next(*batch);
           std::cout << "ERROR: No exception was thrown! hasData=" << hasData 
<< std::endl;
           return 1;
           
       } catch (const std::exception& e) {
           std::cout << "Exception caught!" << std::endl;
           std::cout << "Type: " << typeid(e).name() << std::endl;
           std::cout << "Message: " << e.what() << std::endl;
           return 0;
       } catch (...) {
           std::cout << "Unknown exception caught!" << std::endl;
           return 0;
       }
       
       std::cout << "UNEXPECTED: No exception thrown at all" << std::endl;
       return 1;
   }
   ```
   And the exception is not caught:
   ```sh
   d4cb88cbc6c3:/# g++ -std=c++17 -g     
-I/opt/vcpkg/installed/amd64-linux-static-release/include     
test_orc_exceptions.cc     
-L/opt/vcpkg/installed/amd64-linux-static-release/lib     -lorc -lprotobuf 
-lutf8_range -lutf8_validity     -Wl,--start-group     
/opt/vcpkg/installed/amd64-linux-static-release/lib/libabsl*.a     
-Wl,--end-group     -lz -llz4 -lzstd -lsnappy -lpthread     -o 
test_orc_exceptions_debug
   d4cb88cbc6c3:/# ./test_orc_exceptions_debug 
   Starting test
   Calling next() triggers TimezoneError.
   terminate called after throwing an instance of 'orc::TimezoneError'
     what():  Time zone file /tmp/non_existent_timezone_dir/US/Pacific does not 
exist. Please install IANA time zone database and set TZDIR env.
   Aborted (core dumped)
   d4cb88cbc6c3:/#
   ```
   This is using the docker container on musllinux with the same orc built that 
we use. The version is 2.1.0, with some gdb:
   ```sh
   d4cb88cbc6c3:/# gdb -batch     -ex "set environment TZDIR=/tmp/non_existent" 
    -ex "break __cxa_throw"     -ex "run"     -ex "print (char*)$rdi"     -ex 
"info registers"     -ex "bt"     -ex "continue"     -ex "quit"     
./test_orc_exceptions_debug
   Breakpoint 1 at 0x2e7c0
   warning: Error disabling address space randomization: Operation not permitted
   warning: the debug information found in 
"/usr/lib/debug//lib/ld-musl-x86_64.so.1.debug" does not match 
"/lib/ld-musl-x86_64.so.1" (CRC mismatch).
   === Minimal ORC Timezone Error Test ===
   1. Set TZDIR to invalid path
   2. Opening ORC file...
   3. Creating ORC reader...
   4. Creating row reader...
   5. Creating batch...
   6. Calling next() - this should trigger TimezoneError...
   
   Breakpoint 1, 0x000073b1805c7170 in __cxa_throw () from 
/usr/lib/libstdc++.so.6
   A syntax error in expression, near `'.
   rax            0x0                 0
   rbx            0x7ffc8ad68300      140722637800192
   rcx            0x2                 2
   rdx            0x604166456960      105834004965728
   rsi            0x60416686e0f0      105834009256176
   rdi            0x73b1804c07b0      127206198871984
   rbp            0x7ffc8ad68140      0x7ffc8ad68140
   rsp            0x7ffc8ad680b8      0x7ffc8ad680b8
   r8             0x62                98
   r9             0x5                 5
   r10            0x2                 2
   r11            0x1ec               492
   r12            0x73b180791bf0      127206201826288
   r13            0x73b1804c07b0      127206198871984
   r14            0x7ffc8ad68110      140722637799696
   r15            0x7ffc8ad68180      140722637799808
   rip            0x73b1805c7170      0x73b1805c7170 <__cxa_throw>
   eflags         0x246               [ PF ZF IF ]
   cs             0x33                51
   ss             0x2b                43
   ds             0x0                 0
   es             0x0                 0
   fs             0x0                 0
   gs             0x0                 0
   k0             0x0                 0
   k1             0x0                 0
   k2             0x0                 0
   k3             0x0                 0
   k4             0x0                 0
   k5             0x0                 0
   k6             0x0                 0
   k7             0x0                 0
   fs_base        0x73b180844b80      127206202559360
   gs_base        0x0                 0
   #0  0x000073b1805c7170 in __cxa_throw () from /usr/lib/libstdc++.so.6
   #1  0x000060416645df3e in orc::loadTZDB(std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const&) ()
   #2  0x000060416645e743 in 
std::once_flag::_Prepare_execution::_Prepare_execution<std::call_once<orc::LazyTimezone::getImpl()
 const::{lambda()#1}>(std::once_flag&, orc::LazyTimezone::getImpl() 
const::{lambda()#1}&&)::{lambda()#1}>(orc::LazyTimezone::getImpl() 
const::{lambda()#1}&)::{lambda()#1}::_FUN() ()
   #3  0x000073b1807ff7dd in ?? () from /lib/ld-musl-x86_64.so.1
   #4  0x000073b1807ff776 in pthread_mutexattr_settype () from 
/lib/ld-musl-x86_64.so.1
   #5  0x00006041668822c0 in ?? ()
   #6  0x0000000000000000 in ?? ()
   terminate called after throwing an instance of 'orc::TimezoneError'
     what():  Time zone file /tmp/non_existent_timezone_dir/US/Pacific does not 
exist. Please install IANA time zone database and set TZDIR env.
   
   Program received signal SIGABRT, Aborted.
   0x000073b1807ef6d6 in setjmp () from /lib/ld-musl-x86_64.so.1
   A debugging session is active.
   
        Inferior 1 [process 98908] will be killed.
   
   Quit anyway? (y or n) [answered Y; input not from terminal]
   ```
   This comes from the `std::call_once` on ORC here:
   
   
https://github.com/apache/orc/blob/2cb13946b71140be08b54111ff36fc17da5f09af/c%2B%2B/src/Timezone.cc#L694-L706
   
   What I am surprised is that I am unable to catch an Exception no matter what 
I try :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to