[ 
https://issues.apache.org/jira/browse/ARROW-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16447437#comment-16447437
 ] 

ASF GitHub Bot commented on ARROW-2457:
---------------------------------------

kou commented on a change in pull request #1921: ARROW-2457: [GLib] Support 
large is_valids in builder's append_values()
URL: https://github.com/apache/arrow/pull/1921#discussion_r183258050
 
 

 ##########
 File path: c_glib/arrow-glib/array-builder.cpp
 ##########
 @@ -65,13 +65,37 @@ garrow_array_builder_append_values(GArrowArrayBuilder 
*builder,
                   is_valids_length);
       return FALSE;
     }
-    uint8_t valid_bytes[is_valids_length];
-    for (gint64 i = 0; i < is_valids_length; ++i) {
-      valid_bytes[i] = is_valids[i];
+
+    const gint64 chunk_size = 4096;
+    gint64 n_chunks = is_valids_length / chunk_size;
+    gint64 n_remains = is_valids_length % chunk_size;
+    for (gint64 i = 0; i < n_chunks; ++i) {
+      uint8_t valid_bytes[chunk_size];
+      gint64 offset = chunk_size * i;
+      const gboolean *chunked_is_valids = is_valids + offset;
+      for (gint64 j = 0; j < chunk_size; ++j) {
+        valid_bytes[j] = chunked_is_valids[j];
+      }
+      status = arrow_builder->AppendValues(values + offset,
 
 Review comment:
   Thanks.
   I'll comment on it about this use case.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> garrow_array_builder_append_values() won't work for large arrays
> ----------------------------------------------------------------
>
>                 Key: ARROW-2457
>                 URL: https://issues.apache.org/jira/browse/ARROW-2457
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C, C++, GLib
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Haralampos Gavriilidis
>            Assignee: Kouhei Sutou
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.10.0
>
>
> I am using garrow_array_builder_append_values() to transform a native C array 
> to an Arrow array, without calling arrow_array_builder_append multiple times. 
> When calling garrow_array_builder_append_values() in array-builder.cpp with 
> following signature:
> {code:java}
> garrow_array_builder_append_values(GArrowArrayBuilder *builder,
> const VALUE *values,
> gint64 values_length,
> const gboolean *is_valids,
> gint64 is_valids_length,
> GError **error,
> const gchar *context)
> {code}
> it will fail for large arrays. This is probably happening because the 
> is_valids array is copied to the valid_bytes array (of different type), for 
> which the memory is allocated on the stack, and not on the heap, like shown 
> on the snippet below:
> {code:java}
> uint8_t valid_bytes[is_valids_length];
> for (gint64 i = 0; i < is_valids_length; ++i){ 
>   valid_bytes[i] = is_valids[i]; 
> }
> {code}
>  A way to avoid this problem would be to allocate memory for the valid_bytes 
> array using malloc() or something similar. Is this behavior intended, maybe 
> because no large arrays should be handed over to that function, or it is 
> rather a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to