[ 
https://issues.apache.org/jira/browse/ARROW-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16260841#comment-16260841
 ] 

ASF GitHub Bot commented on ARROW-1808:
---------------------------------------

wesm commented on issue #1337: ARROW-1808: [C++] Make RecordBatch, Table 
virtual interfaces for column access
URL: https://github.com/apache/arrow/pull/1337#issuecomment-346050608
 
 
   @kou it seems like there are two different issues here. 
   
   Here, a schema with 1 field was passed along with a list of 0 columns:
   
   ```diff
   -        record_batch = Arrow::RecordBatch.new(schema, 0, [])
   +        record_batch = Arrow::RecordBatch.new(schema,
   +                                              data.size,
   +                                              [build_boolean_array(data)])
   ```
   
   I believe this would result in segfaults even if the number of rows is 
non-zero. So having empty / length-0 record batches in the IPC writer code path 
is fine so long as the columns matches the schema. 
   
   The reason this bug was not caught before was that the 
`RecordBatch::columns_` member was being used to determine 
`RecordBatch::num_columns()`, whereas now we are using the schema. It seems 
like respecting the schema is the right approach. I could add boundschecking to 
`SimpleRecordBatch::column(i)` and return null if the index is out of bounds, 
would that help at all?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Make RecordBatch interface virtual to permit record batches that 
> lazy-materialize columns
> -----------------------------------------------------------------------------------------------
>
>                 Key: ARROW-1808
>                 URL: https://issues.apache.org/jira/browse/ARROW-1808
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Wes McKinney
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>
> This should be looked at soon to prevent having to define a different virtual 
> interface for record batches. There are places where we are using the record 
> batch constructor directly, and in some third party code (like MapD), so this 
> might be good to get done for 0.8.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to