drin commented on a change in pull request #9810: URL: https://github.com/apache/arrow/pull/9810#discussion_r610317754
########## File path: docs/source/cpp/dataset.rst ########## @@ -0,0 +1,381 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: cpp +.. highlight:: cpp + +================ +Tabular Datasets +================ + +.. seealso:: + :doc:`Dataset API reference <api/dataset>` + +.. warning:: + + The ``arrow::dataset`` namespace is experimental, and a stable API + is not yet guaranteed. + +The Arrow Datasets library provides functionality to efficiently work with +tabular, potentially larger than memory and multi-file datasets: Review comment: Actually, looking at the below section, maybe lift the first bullet point so that the datasets library isn't defined as a library for datasets? This suggestion includes @westonpace 's next few comments and the following bullet points: ``` The Arrow Datasets library provides a unified interface for operations on tabular data from various sources, in various formats. The data may be from buffers, local filesystems, and cloud filesystems; and may be larger than memory or span multiple units (e.g. multiple files). Some supported data formats include parquet, feather, and IPC. Some supported operations include discovery, predicate pushdowns, and read parallelism. ``` (newlines added based on formatting for this comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
