[ 
https://issues.apache.org/jira/browse/ARROW-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039320#comment-17039320
 ] 

Matt Calder commented on ARROW-7873:
------------------------------------

I attached an example of foo.pq. In case it isn't clear from my description of 
the problem, it is necessary to make the odbc connection in order to trigger 
the error. Just reading the parquet file works in both 0.25.3 and 1.0.1. Only 
when the odbc connection is made does the reading lead to a segfault, and only 
in pandas 1.0.1. I wrote foo.pq using both 0.25.3 and 1.0.1 and in both cases I 
saw the segfault in 1.0.1 and not in 0.25.3, long winded way of saying I think 
it is the read not the write that is the problem. That said, the files do 
differ:

{noformat}
xbk@499e30e4f63f:~$ diff foo_101.pq foo_25.pq 
Binary files foo_101.pq and foo_25.pq differ
{noformat}


> [Python] Segfault in pandas version 1.0.1, read_parquet after creating a 
> clickhouse odbc connection
> ---------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-7873
>                 URL: https://issues.apache.org/jira/browse/ARROW-7873
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>         Environment: Ubuntu 18.04
>            Reporter: Matt Calder
>            Priority: Minor
>         Attachments: foo.pq
>
>
> [I posted this issue to the pandas 
> github|[https://github.com/pandas-dev/pandas/issues/31981]].
> We get a segfault when making a call to pd.read_parquet after having made a 
> connection to clickhouse via odbc. Like so,
> {code:python}
> import pyodbc
> import pandas as pd
> con_str = 
> f"Driver=libclickhouseodbc.so;url=http://clickhouse/query;timeout=600";
> with pyodbc.connect(con_str, autocommit=True) as con:
>     pass
> df = pd.DataFrame({'A': [1,1,1], 'B': ['a', 'b', 'c']})
> df.to_parquet('/tmp/foo.pq')
> # This line core dumps:
> pd.read_parquet('/tmp/foo.pq')
> {code}
> This happens with pandas version 1.0.1 but not with pandas 0.25.3. Here's a 
> stacktrace:
> {code:java}
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007ffff7a24801 in __GI_abort () at abort.c:79
> #2  0x00007ffff63c1957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #3  0x00007ffff63c7ab6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #4  0x00007ffff63c7af1 in std::terminate() () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #5  0x00007ffff63c7d24 in __cxa_throw () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #6  0x00007ffff63c6a52 in __cxa_bad_cast () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #7  0x00007ffff64131ec in std::__cxx11::collate<char> const& 
> std::use_facet<std::__cxx11::collate<char> >(std::locale const&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #8  0x00007fffbe4b8279 in std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > 
> std::__cxx11::regex_traits<char>::transform_primary<char const*>(char const*, 
> char const*) const () from /usr/local/lib/libparquet.so.100
> #9  0x00007fffbe4bd71c in 
> std::__detail::_BracketMatcher<std::__cxx11::regex_traits<char>, false, 
> false>::_M_ready() () from /usr/local/lib/libparquet.so.100
> #10 0x00007fffbe4bda9e in void 
> std::__detail::_Compiler<std::__cxx11::regex_traits<char> 
> >::_M_insert_character_class_matcher<false, false>() () from 
> /usr/local/lib/libparquet.so.100
> #11 0x00007fffbe4c0569 in 
> std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom() () 
> from /usr/local/lib/libparquet.so.100
> #12 0x00007fffbe4c0ad8 in 
> std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() 
> () from /usr/local/lib/libparquet.so.100
> #13 0x00007fffbe4c0a43 in 
> std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() 
> () from /usr/local/lib/libparquet.so.100
> #14 0x00007fffbe4c0d1c in 
> std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() 
> () from /usr/local/lib/libparquet.so.100
> #15 0x00007fffbe4c1469 in 
> std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char 
> const*, char const*, std::locale const&, 
> std::regex_constants::syntax_option_type) () from 
> /usr/local/lib/libparquet.so.100
> #16 0x00007fffbe4a93d1 in 
> parquet::ApplicationVersion::ApplicationVersion(std::__cxx11::basic_string<char,
>  std::char_traits<char>, std::allocator<char> > const&) () from 
> /usr/local/lib/libparquet.so.100
> #17 0x00007fffbe4c1c03 in 
> parquet::FileMetaData::FileMetaDataImpl::FileMetaDataImpl(void const*, 
> unsigned int*, std::shared_ptr<parquet::Decryptor> const&) () from 
> /usr/local/lib/libparquet.so.100
> #18 0x00007fffbe4a9e62 in parquet::FileMetaData::FileMetaData(void const*, 
> unsigned int*, std::shared_ptr<parquet::Decryptor> const&) () from 
> /usr/local/lib/libparquet.so.100
> #19 0x00007fffbe4a9ec2 in parquet::FileMetaData::Make(void const*, unsigned 
> int*, std::shared_ptr<parquet::Decryptor> const&) () from 
> /usr/local/lib/libparquet.so.100
> #20 0x00007fffbe48acaf in 
> parquet::SerializedFile::ParseUnencryptedFileMetadata(std::shared_ptr<arrow::Buffer>
>  const&, long, long, std::shared_ptr<arrow::Buffer>*, unsigned int*, unsigned 
> int*) () from /usr/local/lib/libparquet.so.100
> #21 0x00007fffbe492d75 in parquet::SerializedFile::ParseMetaData() () from 
> /usr/local/lib/libparquet.so.100
> #22 0x00007fffbe48d8f8 in 
> parquet::ParquetFileReader::Contents::Open(std::shared_ptr<arrow::io::RandomAccessFile>,
>  parquet::ReaderProperties const&, std::shared_ptr<parquet::FileMetaData>) () 
> from /usr/local/lib/libparquet.so.100
> #23 0x00007fffbe48e598 in 
> parquet::ParquetFileReader::Open(std::shared_ptr<arrow::io::RandomAccessFile>,
>  parquet::ReaderProperties const&, std::shared_ptr<parquet::FileMetaData>) () 
> from /usr/local/lib/libparquet.so.100
> #24 0x00007fffbe3a89bd in 
> parquet::arrow::FileReaderBuilder::Open(std::shared_ptr<arrow::io::RandomAccessFile>,
>  parquet::ReaderProperties const&, std::shared_ptr<parquet::FileMetaData>) () 
> from /usr/local/lib/libparquet.so.100
> #25 0x00007fffbe7dc348 in 
> __pyx_pf_7pyarrow_8_parquet_13ParquetReader_2open(__pyx_obj_7pyarrow_8_parquet_ParquetReader*,
>  _object*, int, _object*, __pyx_obj_7pyarrow_8_parquet_FileMetaData*, int) ()
>    from 
> /usr/local/lib/python3.6/dist-packages/pyarrow-0.15.1.dev539+g8cf0c8e0a-py3.6-linux-x86_64.egg/pyarrow/_parquet.cpython-36m-x86_64-linux-gnu.so
> #26 0x00007fffbe7dcbc9 in 
> __pyx_pw_7pyarrow_8_parquet_13ParquetReader_3open(_object*, _object*, 
> _object*) () from 
> /usr/local/lib/python3.6/dist-packages/pyarrow-0.15.1.dev539+g8cf0c8e0a-py3.6-linux-x86_64.egg/pyarrow/_parquet.cpython-36m-x86_64-linux-gnu.so
> #27 0x000000000050ac25 in _PyCFunction_FastCallDict (kwargs=<optimized out>, 
> nargs=<optimized out>, args=<optimized out>, func_obj=<built-in method open 
> of pyarrow._parquet.ParquetReader object at remote 0x7fffbfc6b938>) at 
> ../Objects/methodobject.c:231
> #28 _PyCFunction_FastCallKeywords (kwnames=<optimized out>, nargs=<optimized 
> out>, stack=<optimized out>, func=<optimized out>) at 
> ../Objects/methodobject.c:294
> #29 call_function.lto_priv () at ../Python/ceval.c:4851
> #30 0x000000000050d390 in _PyEval_EvalFrameDefault () at 
> ../Python/ceval.c:3351
> #31 0x0000000000508245 in PyEval_EvalFrameEx (throwflag=0, f=
>     Frame 0x142a818, for file 
> /usr/local/lib/python3.6/dist-packages/pyarrow-0.15.1.dev539+g8cf0c8e0a-py3.6-linux-x86_64.egg/pyarrow/parquet.py,
>  line 137, in __init__ 
> (self=<ParquetFile(reader=<pyarrow._parquet.ParquetReader at remote 
> 0x7fffbfc6b938>) at remote 0x7fffc4b68cc0>, source='/tmp/foo.pq', 
> metadata=None, common_metadata=None, read_dictionary=None, memory_map=False, 
> buffer_size=0)) at ../Python/ceval.c:754
> #32 _PyEval_EvalCodeWithName.lto_priv.1836 () at ../Python/ceval.c:4166
> #33 0x0000000000509642 in _PyFunction_FastCallDict () at 
> ../Python/ceval.c:5075
> #34 0x0000000000595311 in _PyObject_FastCallDict (kwargs={'metadata': None, 
> 'memory_map': False, 'read_dictionary': None, 'common_metadata': None, 
> 'buffer_size': 0}, nargs=2, args=0x7fffffffc430, func=<function at remote 
> 0x7fffbfc5e378>)
>     at ../Objects/abstract.c:2310
> #35 _PyObject_Call_Prepend (kwargs={'metadata': None, 'memory_map': False, 
> 'read_dictionary': None, 'common_metadata': None, 'buffer_size': 0}, 
> args=<optimized out>, obj=<optimized out>, func=<function at remote 
> 0x7fffbfc5e378>) at ../Objects/abstract.c:2373
> #36 method_call.lto_priv () at ../Objects/classobject.c:314
> #37 0x000000000054a6ff in PyObject_Call (kwargs={'metadata': None, 
> 'memory_map': False, 'read_dictionary': None, 'common_metadata': None, 
> 'buffer_size': 0}, args=('/tmp/foo.pq',), func=<method at remote 
> 0x7ffff7f67fc8>) at ../Objects/abstract.c:2261
> #38 slot_tp_init () at ../Objects/typeobject.c:6420
> #39 0x0000000000551b81 in type_call.lto_priv () at ../Objects/typeobject.c:915
> ---Type <return> to continue, or q <return> to quit---
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to