Siyuan Zhuang created ARROW-3698: ------------------------------------ Summary: Segmentation fault when using large table in Gandiva Key: ARROW-3698 URL: https://issues.apache.org/jira/browse/ARROW-3698 Project: Apache Arrow Issue Type: Bug Components: C++, Gandiva Reporter: Siyuan Zhuang
{code} >>> import pyarrow as pa Registry has 519 pre-compiled functions >>> import pandas as pd >>> import numpy as np >>> import pyarrow.gandiva as gandiva >>> import timeit >>> >>> from matplotlib import pyplot as plt >>> for scale in range(25, 26): ... frame_data = 1.0 * np.random.randint(0, 100, size=(2**scale, 2)) ... df = pd.DataFrame(frame_data).add_prefix("col") ... table = pa.Table.from_pandas(df) ... >>> >>> def float64_add(table): ... builder = gandiva.TreeExprBuilder() ... node_a = builder.make_field(table.schema.field_by_name("col0")) ... node_b = builder.make_field(table.schema.field_by_name("col1")) ... sum = builder.make_function(b"add", [node_a, node_b], pa.float64()) ... field_result = pa.field("c", pa.float64()) ... expr = builder.make_expression(sum, field_result) ... projector = gandiva.make_projector(table.schema, [expr], pa.default_memory_pool()) ... return projector ... >>> projector = float64_add(table) >>> projector.evaluate(table.to_batches()[0]) [1] 36393 segmentation fault python{code} It is because there is an integer overflow in Gandiva: [https://github.com/apache/arrow/blob/1a6545aa51f5f41f0233ee0a11ef87d21127c5ed/cpp/src/gandiva/projector.cc#L141] It should be `int64_t` instead of `int`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)