[ 
https://issues.apache.org/jira/browse/ARROW-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259806#comment-16259806
 ] 

ASF GitHub Bot commented on ARROW-1808:
---------------------------------------

xhochy commented on a change in pull request #1337: ARROW-1808: [C++] Make 
RecordBatch, Table virtual interfaces for column access
URL: https://github.com/apache/arrow/pull/1337#discussion_r152104789
 
 

 ##########
 File path: cpp/src/arrow/record_batch.h
 ##########
 @@ -0,0 +1,154 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef ARROW_RECORD_BATCH_H
+#define ARROW_RECORD_BATCH_H
+
+#include <cstdint>
+#include <memory>
+#include <string>
+#include <vector>
+
+#include "arrow/array.h"
+#include "arrow/type.h"
+#include "arrow/util/macros.h"
+#include "arrow/util/visibility.h"
+
+namespace arrow {
+
+class KeyValueMetadata;
+class Status;
+
+/// \class RecordBatch
+/// \brief Collection of equal-length arrays matching a particular Schema
+///
+/// A record batch is table-like data structure that is semantically a sequence
+/// of fields, each a contiguous Arrow array
+class ARROW_EXPORT RecordBatch {
+ public:
+  virtual ~RecordBatch() = default;
+
+  /// \param[in] schema The record batch schema
+  /// \param[in] num_rows length of fields in the record batch. Each array
+  /// should have the same length as num_rows
+  /// \param[in] columns the record batch fields as vector of arrays
+  static std::shared_ptr<RecordBatch> Make(
+      const std::shared_ptr<Schema>& schema, int64_t num_rows,
+      const std::vector<std::shared_ptr<Array>>& columns);
+
+  /// \brief Move-based constructor for a vector of Array instances
+  static std::shared_ptr<RecordBatch> Make(const std::shared_ptr<Schema>& 
schema,
+                                           int64_t num_rows,
+                                           
std::vector<std::shared_ptr<Array>>&& columns);
+
+  /// \brief Construct record batch from vector of internal data structures
+  /// \since 0.5.0
+  ///
+  /// This class is only provided with an rvalue-reference for the input data,
+  /// and is intended for internal use, or advanced users.
+  ///
+  /// \param schema the record batch schema
+  /// \param num_rows the number of semantic rows in the record batch. This
+  /// should be equal to the length of each field
+  /// \param columns the data for the batch's columns
+  static std::shared_ptr<RecordBatch> Make(
+      const std::shared_ptr<Schema>& schema, int64_t num_rows,
+      std::vector<std::shared_ptr<ArrayData>>&& columns);
+
+  /// \brief Construct record batch by copying vector of array data
+  /// \since 0.5.0
+  static std::shared_ptr<RecordBatch> Make(
+      const std::shared_ptr<Schema>& schema, int64_t num_rows,
+      const std::vector<std::shared_ptr<ArrayData>>& columns);
+
+  /// \brief Determine if two record batches are exactly equal
+  /// \return true if batches are equal
+  bool Equals(const RecordBatch& other) const;
+
+  /// \brief Determine if two record batches are approximately equal
+  bool ApproxEquals(const RecordBatch& other) const;
+
+  // \return the table's schema
+  /// \return true if batches are equal
+  std::shared_ptr<Schema> schema() const { return schema_; }
+
+  /// \brief Retrieve an array from the record batch
+  /// \param[in] i field index, does not boundscheck
+  /// \return an Array object
+  virtual std::shared_ptr<Array> column(int i) const = 0;
+
+  /// \brief Retrieve an array's internaldata from the record batch
+  /// \param[in] i field index, does not boundscheck
+  /// \return an internal ArrayData object
+  virtual std::shared_ptr<ArrayData> column_data(int i) const = 0;
+
+  virtual std::shared_ptr<RecordBatch> ReplaceSchemaMetadata(
+      const std::shared_ptr<const KeyValueMetadata>& metadata) const = 0;
+
+  /// \brief Name in i-th column
+  const std::string& column_name(int i) const;
 
 Review comment:
   Shouldn't this be `std::string`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Make RecordBatch interface virtual to permit record batches that 
> lazy-materialize columns
> -----------------------------------------------------------------------------------------------
>
>                 Key: ARROW-1808
>                 URL: https://issues.apache.org/jira/browse/ARROW-1808
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Wes McKinney
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>
> This should be looked at soon to prevent having to define a different virtual 
> interface for record batches. There are places where we are using the record 
> batch constructor directly, and in some third party code (like MapD), so this 
> might be good to get done for 0.8.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to