[ 
https://issues.apache.org/jira/browse/ARROW-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3698:
-----------------------------------

    Assignee: Siyuan Zhuang

> [C++] Segmentation fault when using a large table in Gandiva
> ------------------------------------------------------------
>
>                 Key: ARROW-3698
>                 URL: https://issues.apache.org/jira/browse/ARROW-3698
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Gandiva
>            Reporter: Siyuan Zhuang
>            Assignee: Siyuan Zhuang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.12.0
>
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> {code}
> >>> import pyarrow as pa
> Registry has 519 pre-compiled functions
> >>> import pandas as pd
> >>> import numpy as np
> >>> import pyarrow.gandiva as gandiva
> >>> import timeit
> >>>
> >>> from matplotlib import pyplot as plt
> >>> for scale in range(25, 26):
> ... frame_data = 1.0 * np.random.randint(0, 100, size=(2**scale, 2))
> ... df = pd.DataFrame(frame_data).add_prefix("col")
> ... table = pa.Table.from_pandas(df)
> ...
> >>>
> >>> def float64_add(table):
> ... builder = gandiva.TreeExprBuilder()
> ... node_a = builder.make_field(table.schema.field_by_name("col0"))
> ... node_b = builder.make_field(table.schema.field_by_name("col1"))
> ... sum = builder.make_function(b"add", [node_a, node_b], pa.float64())
> ... field_result = pa.field("c", pa.float64())
> ... expr = builder.make_expression(sum, field_result)
> ... projector = gandiva.make_projector(table.schema, [expr], 
> pa.default_memory_pool())
> ... return projector
> ...
> >>> projector = float64_add(table)
> >>> projector.evaluate(table.to_batches()[0])
> [1] 36393 segmentation fault python{code}
> It is because there is an integer overflow in Gandiva:
> [https://github.com/apache/arrow/blob/1a6545aa51f5f41f0233ee0a11ef87d21127c5ed/cpp/src/gandiva/projector.cc#L141]
> It should be `int64_t` instead of `int`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to