[GitHub] [arrow] drin commented on a change in pull request #9810: ARROW-11677: [C++][Docs] Add basic C++ datasets documentation

GitBox Thu, 08 Apr 2021 20:26:49 -0700


drin commented on a change in pull request #9810:
URL: https://github.com/apache/arrow/pull/9810#discussion_r610317754




##########
File path: docs/source/cpp/dataset.rst
##########
@@ -0,0 +1,381 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. default-domain:: cpp
+.. highlight:: cpp
+
+================
+Tabular Datasets
+================
+
+.. seealso::
+   :doc:`Dataset API reference <api/dataset>`
+
+.. warning::
+
+    The ``arrow::dataset`` namespace is experimental, and a stable API
+    is not yet guaranteed.
+
+The Arrow Datasets library provides functionality to efficiently work with
+tabular, potentially larger than memory and multi-file datasets:

Review comment:
       Actually, looking at the below section, maybe lift the first bullet 
point so that the datasets library isn't defined as a library for datasets? 
This suggestion includes @westonpace 's next few comments and the following 
bullet points:
   
   ```
   The Arrow Datasets library provides a unified interface for
   operations on tabular data from various sources, in various
   formats. The data may be from buffers, local filesystems,
   and cloud filesystems; and may be larger than memory or span
   multiple units (e.g. multiple files). Some supported data formats
   include parquet, feather, and IPC. Some supported operations
   include discovery, predicate pushdowns, and read parallelism.
   ```
   
   (newlines added based on formatting for this comment)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] drin commented on a change in pull request #9810: ARROW-11677: [C++][Docs] Add basic C++ datasets documentation

Reply via email to