jorisvandenbossche opened a new issue, #38821:
URL: https://github.com/apache/arrow/issues/38821

   Found while investigating https://github.com/apache/arrow/issues/38618 (so 
might be caused by the same change in 14.0, possibly related to 
https://github.com/apache/arrow/pull/35440).
   
   Assuming `minio` is running as how it is set up in our tests, the following 
script crashes:
   
   ```python
   import os
   from pyarrow.fs import S3FileSystem
   
   host, port, access_key, secret_key = ('localhost', 54383, 'arrow', 
'apachearrow')
   
   s3_bucket = 'pyarrow-filesystem/'
   
   s3fs = S3FileSystem(
       access_key=access_key,
       secret_key=secret_key,
       endpoint_override='{}:{}'.format(host, port),
       scheme='http',
       allow_bucket_creation=True,
       allow_bucket_deletion=True
   )
   
   s3fs.create_dir(s3_bucket)
   
   test_dir = "test_dir"
   s3fs.create_dir(s3_bucket + "/" + test_dir)
   print(s3fs.get_file_info(s3_bucket + "/" + test_dir))
   print(s3fs.get_file_info(s3_bucket + test_dir))
   s3fs.delete_dir(s3_bucket + "/" + test_dir)
   ```
   
   output:
   
   ```
   <FileInfo for 'pyarrow-filesystem//test_dir': type=FileType.Directory>
   <FileInfo for 'pyarrow-filesystem/test_dir': type=FileType.Directory>
   terminate called after throwing an instance of 'std::out_of_range'
     what():  basic_string::substr: __pos (which is 19) > this->size() (which 
is 18)
   Aborted (core dumped)
   ```
   
   So what I happened here is that I accidentally constructed a dir path name 
with a double slash using `s3_bucket + "/" + test_dir` while the `s3_bucket` 
already included a trailing `/`. While this was not intentionally, and once 
discovered easy to fix, we should still not crash for something like that.
   
   For S3 (at least using minio), it seems we do allow to create the directory 
(and it will ignore the double slash, just creating a directory with name 
"test_dir"), and to get the file info (both with a single or double slash, it 
returns the info about the same directory), but then when trying to delete the 
directory using the name with double slash, it segfaults.
   
   GDB backtrace:
   
   <details>
   
   ```
   #0  0x00007ffff7c7800b in raise () from /lib/x86_64-linux-gnu/libc.so.6
   #1  0x00007ffff7c57859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
   #2  0x00007ffff3822026 in __gnu_cxx::__verbose_terminate_handler () at 
../../../../libstdc++-v3/libsupc++/vterminate.cc:95
   #3  0x00007ffff3820514 in __cxxabiv1::__terminate (handler=<optimized out>) 
at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
   #4  0x00007ffff3820566 in std::terminate () at 
../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
   #5  0x00007ffff3820758 in __cxxabiv1::__cxa_throw (obj=0x7fffa00065d0, 
tinfo=0x7ffff39155e8 <typeinfo for std::out_of_range>, dest=0x7ffff382cd34 
<std::out_of_range::~out_of_range()>)
       at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:98
   #6  0x00007ffff383bbea in std::__throw_out_of_range_fmt 
(__fmt=__fmt@entry=0x7ffff4eba6c0 "%s: __pos (which is %zu) > this->size() 
(which is %zu)") at ../../../../../libstdc++-v3/src/c++11/functexcept.cc:101
   #7  0x00007ffff4cedfee in std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> >::_M_check (__s=0x7ffff4eba770 
"basic_string::substr", __pos=<optimized out>, this=0x7fffa000a088)
       at 
/home/joris/miniconda3/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/10.4.0/bits/basic_string.h:321
   #8  std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >::substr (__n=18446744073709551615, __pos=<optimized 
out>, this=0x7fffa000a088)
       at 
/home/joris/miniconda3/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/10.4.0/bits/basic_string.h:2848
   #9  
arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> > const&, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
const&)::{lambda(arrow::util::AsyncTaskScheduler*, 
arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*,
 arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() 
const::{lambda(std::vector<arrow::fs::FileInfo, 
std::allocator<arrow::fs::FileInfo> > 
const&)#1}::operator()(std::vector<arrow::fs::FileInfo, 
std::allocator<arrow::fs::FileInfo> > const&) const (
       __closure=__closure@entry=0x7fffa0004bc8, file_infos=...) at 
/home/joris/scipy/repos/arrow/cpp/src/arrow/filesystem/s3fs.cc:2411
   #10 0x00007ffff4d07ff7 in arrow::LoopBody::Callback::operator() (next=..., 
this=0x7fffa0004bc8) at 
/home/joris/scipy/repos/arrow/cpp/src/arrow/util/async_generator.h:93
   #11 
arrow::detail::ContinueFuture::operator()<arrow::VisitAsyncGenerator<std::vector<arrow::fs::FileInfo,
 std::allocator<arrow::fs::FileInfo> >, 
arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> > const&, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
const&)::{lambda(arrow::util::AsyncTaskScheduler*, 
arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*,
 arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() 
const::{lambda(std::vector<arrow::fs::FileInfo, 
std::allocator<arrow::fs::FileInfo> > 
const&)#1}>(std::function<arrow::Future<std::vector<arrow::fs::FileInfo, 
std::allocator<arrow::fs::FileInfo> > > ()>, 
arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> > const&, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
 const&)::{lambda(arrow::util::AsyncTaskScheduler*, 
arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*,
 arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() 
const::{lambda(std::vector<arrow::fs::FileInfo, 
std::allocator<arrow::fs::FileInfo> > const&)#1})::LoopBody::Callback, 
std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&, 
arrow::Result<std::optional<arrow::internal::Empty> >, 
arrow::Future<std::optional<arrow::internal::Empty> > 
>(arrow::Future<std::optional<arrow::internal::Empty> >, 
arrow::VisitAsyncGenerator<std::vector<arrow::fs::FileInfo, 
std::allocator<arrow::fs::FileInfo> >, 
arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> > const&, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
const&)::{lambda(arrow::util::AsyncTaskScheduler*, 
arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::A
 syncTaskScheduler*, arrow::fs::S3FileSystem::Impl*) 
const::{lambda()#1}::operator()() 
const::{lambda(std::vector<arrow::fs::FileInfo, 
std::allocator<arrow::fs::FileInfo> > 
const&)#1}>(std::function<arrow::Future<std::vector<arrow::fs::FileInfo, 
std::allocator<arrow::fs::FileInfo> > > ()>, 
arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> > const&, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
const&)::{lambda(arrow::util::AsyncTaskScheduler*, 
arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*,
 arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() 
const::{lambda(std::vector<arrow::fs::FileInfo, 
std::allocator<arrow::fs::FileInfo> > const&)#1})::LoopBody::Callback&&, 
std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&) 
const (this=<optimized out>, f=..., next=...)
       at /home/joris/scipy/repos/arrow/cpp/src/arrow/util/future.h:150
   ```
   
   </details>
   
   It seems it is crashing while trying to format the dir name for an exception 
that is raised from `DoDeleteDirContentsAsync`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to