Siyuan Zhuang created ARROW-3698:
------------------------------------

             Summary: Segmentation fault when using large table in Gandiva
                 Key: ARROW-3698
                 URL: https://issues.apache.org/jira/browse/ARROW-3698
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Gandiva
            Reporter: Siyuan Zhuang


{code}
>>> import pyarrow as pa
Registry has 519 pre-compiled functions
>>> import pandas as pd
>>> import numpy as np
>>> import pyarrow.gandiva as gandiva
>>> import timeit
>>>
>>> from matplotlib import pyplot as plt
>>> for scale in range(25, 26):
... frame_data = 1.0 * np.random.randint(0, 100, size=(2**scale, 2))
... df = pd.DataFrame(frame_data).add_prefix("col")
... table = pa.Table.from_pandas(df)
...
>>>
>>> def float64_add(table):
... builder = gandiva.TreeExprBuilder()
... node_a = builder.make_field(table.schema.field_by_name("col0"))
... node_b = builder.make_field(table.schema.field_by_name("col1"))
... sum = builder.make_function(b"add", [node_a, node_b], pa.float64())
... field_result = pa.field("c", pa.float64())
... expr = builder.make_expression(sum, field_result)
... projector = gandiva.make_projector(table.schema, [expr], 
pa.default_memory_pool())
... return projector
...
>>> projector = float64_add(table)
>>> projector.evaluate(table.to_batches()[0])
[1] 36393 segmentation fault python{code}
It is because there is an integer overflow in Gandiva:
[https://github.com/apache/arrow/blob/1a6545aa51f5f41f0233ee0a11ef87d21127c5ed/cpp/src/gandiva/projector.cc#L141]

It should be `int64_t` instead of `int`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to