This is an automated email from the ASF dual-hosted git repository. kou pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/arrow-dotnet.git
The following commit(s) were added to refs/heads/main by this push: new 9f6e823 chore: Add docfx configuration for generating API documentation (#28) 9f6e823 is described below commit 9f6e8236c41df9abe9536688b5acf0e3092199a7 Author: Adam Reeve <adre...@gmail.com> AuthorDate: Mon Sep 1 12:21:11 2025 +1200 chore: Add docfx configuration for generating API documentation (#28) ## What's Changed * Adds docfx configuration for API docs * Copied README into docs/index.md to use as the landing page for the docs. I've removed dev/build related docs and added links into the API reference pages To generate and view the docs: ``` cd docs dotnet tool install -g docfx docfx docfx.json docfx serve _site ``` Closes #27. --------- Co-authored-by: Sutou Kouhei <k...@cozmixng.org> --- .github/workflows/test.yaml | 11 ++ .../rat_exclude_files.txt => ci/scripts/docs.sh | 19 ++- dev/release/rat_exclude_files.txt | 2 + .../rat_exclude_files.txt => docs/.gitignore | 7 +- docs/docfx.json | 61 +++++++++ docs/images/README.md | 33 +++++ docs/images/favicon.png | Bin 0 -> 9201 bytes docs/images/logo.svg | 25 ++++ docs/index.md | 151 +++++++++++++++++++++ dev/release/rat_exclude_files.txt => docs/toc.yml | 13 +- 10 files changed, 307 insertions(+), 15 deletions(-) diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml index 8cb7893..575a1ed 100644 --- a/.github/workflows/test.yaml +++ b/.github/workflows/test.yaml @@ -83,3 +83,14 @@ jobs: - name: Test shell: bash run: ci/scripts/test.sh $(pwd) + + docs: + name: Build Documentation + runs-on: ubuntu-latest + timeout-minutes: 5 + steps: + - name: Checkout + uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 + - name: Build documentation + shell: bash + run: ci/scripts/docs.sh $(pwd) diff --git a/dev/release/rat_exclude_files.txt b/ci/scripts/docs.sh old mode 100644 new mode 100755 similarity index 79% copy from dev/release/rat_exclude_files.txt copy to ci/scripts/docs.sh index 97d498a..9b837fc --- a/dev/release/rat_exclude_files.txt +++ b/ci/scripts/docs.sh @@ -1,3 +1,5 @@ +#!/usr/bin/env bash +# # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information @@ -15,8 +17,15 @@ # specific language governing permissions and limitations # under the License. -*.csproj -*.resx -*.sln -.github/pull_request_template.md -src/Apache.Arrow/Flatbuf/* +set -eux + +source_dir=${1} + +pushd "${source_dir}/docs" + +dotnet tool install -g docfx + +docfx metadata --warningsAsErrors docfx.json +docfx build --warningsAsErrors docfx.json + +popd diff --git a/dev/release/rat_exclude_files.txt b/dev/release/rat_exclude_files.txt index 97d498a..c2b32b7 100644 --- a/dev/release/rat_exclude_files.txt +++ b/dev/release/rat_exclude_files.txt @@ -20,3 +20,5 @@ *.sln .github/pull_request_template.md src/Apache.Arrow/Flatbuf/* +docs/images/*.png +docs/images/*.svg diff --git a/dev/release/rat_exclude_files.txt b/docs/.gitignore similarity index 90% copy from dev/release/rat_exclude_files.txt copy to docs/.gitignore index 97d498a..e66aa1b 100644 --- a/dev/release/rat_exclude_files.txt +++ b/docs/.gitignore @@ -15,8 +15,5 @@ # specific language governing permissions and limitations # under the License. -*.csproj -*.resx -*.sln -.github/pull_request_template.md -src/Apache.Arrow/Flatbuf/* +/api +/_site diff --git a/docs/docfx.json b/docs/docfx.json new file mode 100644 index 0000000..0aa5847 --- /dev/null +++ b/docs/docfx.json @@ -0,0 +1,61 @@ +{ + "$schema": "https://raw.githubusercontent.com/dotnet/docfx/main/schemas/docfx.schema.json", + "metadata": [ + { + "src": [ + { + "src": "../src", + "files": [ + "**/*.csproj" + ] + } + ], + "dest": "api", + "properties": { + "ProduceReferenceAssembly": "true" + } + } + ], + "build": { + "content": [ + { + "files": [ + "**/*.{md,yml}" + ], + "exclude": [ + "_site/**", + "images/**" + ] + } + ], + "resource": [ + { + "files": [ + "images/*" + ], + "exclude": [ + "**/*.md" + ] + } + ], + "output": "_site", + "template": [ + "default", + "modern" + ], + "globalMetadata": { + "_appFaviconPath": "images/favicon.png", + "_appLogoPath": "images/logo.svg", + "_appName": "Apache Arrow .NET", + "_appTitle": "Apache Arrow .NET", + "_appFooter": "© 2018 The Apache Software Foundation", + "_enableNewTab": true, + "_enableSearch": true + }, + "markdownEngineProperties": { + "markdigExtensions": [ + "attributes" + ] + } + } +} diff --git a/docs/images/README.md b/docs/images/README.md new file mode 100644 index 0000000..0d8e05e --- /dev/null +++ b/docs/images/README.md @@ -0,0 +1,33 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# Images + +This directory contains images used in the Apache Arrow .NET documentation. + +## logo.svg + +This file is based on the file `arrow-logo_chevrons_black-txt_transparent-bg.svg` +from the [Apache Arrow Visual Identity website](https://arrow.apache.org/visual_identity/) +and has had the height and width modified to fit the documentation header, +while maintaining the original aspect ratio. +The rectangular outline was also removed. + +## favicon.png + +This file matches the favicon used in the Apache Arrow website and was copied from the +[arrow-site repository](https://github.com/apache/arrow-site/blob/8884e2320ca131081a2617cdf93a222f0e92b6a3/img/logo.png). diff --git a/docs/images/favicon.png b/docs/images/favicon.png new file mode 100644 index 0000000..ee68a6c Binary files /dev/null and b/docs/images/favicon.png differ diff --git a/docs/images/logo.svg b/docs/images/logo.svg new file mode 100644 index 0000000..5e7d9c6 --- /dev/null +++ b/docs/images/logo.svg @@ -0,0 +1,25 @@ +<?xml version='1.0' encoding='UTF-8' ?> +<svg xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink' class='svglite' width='43.43pt' height='38pt' viewBox='0 0 1350.00 1181.25'> +<defs> + <style type='text/css'><![CDATA[ + .svglite line, .svglite polyline, .svglite polygon, .svglite path, .svglite rect, .svglite circle { + fill: none; + stroke: #000000; + stroke-linecap: round; + stroke-linejoin: round; + stroke-miterlimit: 10.00; + } + ]]></style> +</defs> +<rect width='100%' height='100%' style='stroke: none; fill: none;'/> +<defs> + <clipPath id='cpMC4wMHwxMzUwLjAwfDAuMDB8MTE4MS4yNQ=='> + <rect x='0.00' y='0.00' width='1350.00' height='1181.25' /> + </clipPath> +</defs> +<g clip-path='url(#cpMC4wMHwxMzUwLjAwfDAuMDB8MTE4MS4yNQ==)'> +<polygon points='168.75,168.75 590.62,590.62 168.75,1012.50 168.75,843.75 421.88,590.62 168.75,337.50 168.75,168.75 ' style='stroke-width: 1.07; fill: #000000;' /> +<polygon points='464.06,168.75 885.94,590.62 464.06,1012.50 464.06,843.75 717.19,590.62 464.06,337.50 464.06,168.75 ' style='stroke-width: 1.07; fill: #000000;' /> +<polygon points='759.38,168.75 1181.25,590.62 759.38,1012.50 759.38,843.75 1012.50,590.62 759.38,337.50 759.38,168.75 ' style='stroke-width: 1.07; fill: #000000;' /> +</g> +</svg> diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..732e339 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,151 @@ +--- +_layout: landing +--- +<!--- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# Apache Arrow .NET + +An implementation of Arrow targeting .NET. + +See our current [feature matrix](https://github.com/apache/arrow/blob/main/docs/source/status.rst) +for currently available features. + +## Implementation + +- Arrow specification 1.0.0. (Support for reading 0.11+.) +- C# 11 +- .NET Standard 2.0, .NET 6.0, .NET 8.0 and .NET Framework 4.6.2 +- Asynchronous I/O +- Uses modern .NET runtime features such as **Span<T>**, **Memory<T>**, **MemoryManager<T>**, and **System.Buffers** primitives for memory allocation, memory storage, and fast serialization. +- Uses **Acyclic Visitor Pattern** for array types and arrays to facilitate serialization, record batch traversal, and format growth. + +## Known Issues + +- Cannot read Arrow files containing tensors. +- Cannot easily modify allocation strategy without implementing a custom memory pool. All allocations are currently 64-byte aligned and padded to 8-bytes. +- Default memory allocation strategy uses an over-allocation strategy with pointer fixing, which results in significant memory overhead for small buffers. A buffer that requires a single byte for storage may be backed by an allocation of up to 64-bytes to satisfy alignment requirements. +- There are currently few builder APIs available for specific array types. Arrays must be built manually with an arrow buffer builder abstraction. +- FlatBuffer code generation is not included in the build process. +- Serialization implementation does not perform exhaustive validation checks during deserialization in every scenario. +- Throws exceptions with vague, inconsistent, or non-localized messages in many situations +- Throws exceptions that are non-specific to the Arrow implementation in some circumstances where it probably should (eg. does not throw ArrowException exceptions) +- Lack of code documentation +- Lack of usage examples + +## Usage + +Example demonstrating reading [RecordBatches](xref:Apache.Arrow.RecordBatch) from an Arrow IPC file using an +[ArrowFileReader](xref:Apache.Arrow.Ipc.ArrowFileReader): + + using System.Diagnostics; + using System.IO; + using System.Threading.Tasks; + using Apache.Arrow; + using Apache.Arrow.Ipc; + + public static async Task<RecordBatch> ReadArrowAsync(string filename) + { + using (var stream = File.OpenRead(filename)) + using (var reader = new ArrowFileReader(stream)) + { + var recordBatch = await reader.ReadNextRecordBatchAsync(); + Debug.WriteLine("Read record batch with {0} column(s)", recordBatch.ColumnCount); + return recordBatch; + } + } + + +## Status + +### Memory Management + +- Allocations are 64-byte aligned and padded to 8-bytes. +- Allocations are automatically garbage collected + +### Arrays + +#### Primitive Types + +- [Int8](xref:Apache.Arrow.Types.Int8Type), [Int16](xref:Apache.Arrow.Types.Int16Type), [Int32](xref:Apache.Arrow.Types.Int32Type), [Int64](xref:Apache.Arrow.Types.Int64Type) +- [UInt8](xref:Apache.Arrow.Types.UInt8Type), [UInt16](xref:Apache.Arrow.Types.UInt16Type), [UInt32](xref:Apache.Arrow.Types.UInt32Type), [UInt64](xref:Apache.Arrow.Types.UInt64Type) +- [Float](xref:Apache.Arrow.Types.FloatType), [Double](xref:Apache.Arrow.Types.DoubleType), [Half-float](xref:Apache.Arrow.Types.HalfFloatType) (.NET 6+) +- [Binary](xref:Apache.Arrow.Types.BinaryType) (variable-length) +- [String](xref:Apache.Arrow.Types.StringType) (utf-8) +- [Null](xref:Apache.Arrow.Types.NullType) + +#### Parametric Types + +- [Timestamp](xref:Apache.Arrow.Types.TimestampType) +- [Date32](xref:Apache.Arrow.Types.Date32Type), [Date64](xref:Apache.Arrow.Types.Date64Type) +- [Decimal32](xref:Apache.Arrow.Types.Decimal32Type), [Decimal64](xref:Apache.Arrow.Types.Decimal64Type), [Decimal128](xref:Apache.Arrow.Types.Decimal128Type), [Decimal256](xref:Apache.Arrow.Types.Decimal256Type) +- [Time32](xref:Apache.Arrow.Types.Time32Type), [Time64](xref:Apache.Arrow.Types.Time64Type) +- [Binary](xref:Apache.Arrow.Types.BinaryType) (fixed-length) +- [List](xref:Apache.Arrow.Types.ListType) +- [Struct](xref:Apache.Arrow.Types.StructType) +- [Union](xref:Apache.Arrow.Types.UnionType) +- [Map](xref:Apache.Arrow.Types.MapType) +- [Duration](xref:Apache.Arrow.Types.DurationType) +- [Interval](xref:Apache.Arrow.Types.IntervalType) + +#### Type Metadata + +- Data Types +- [Fields](xref:Apache.Arrow.Field) +- [Schema](xref:Apache.Arrow.Schema) + +#### Serialization + +- File [Reader](xref:Apache.Arrow.Ipc.ArrowFileReader) and [Writer](xref:Apache.Arrow.Ipc.ArrowFileWriter) +- Stream [Reader](xref:Apache.Arrow.Ipc.ArrowStreamReader) and [Writer](xref:Apache.Arrow.Ipc.ArrowStreamWriter) + +### IPC Format + +#### Compression + +- Buffer compression and decompression is supported, but requires installing the `Apache.Arrow.Compression` package. + When reading compressed data, you must pass an [CompressionCodecFactory](xref:Apache.Arrow.Compression.CompressionCodecFactory) + instance to the [ArrowFileReader](xref:Apache.Arrow.Ipc.ArrowFileReader) or + [ArrowStreamReader](xref:Apache.Arrow.Ipc.ArrowStreamReader) constructor, and when writing compressed data a + [CompressionCodecFactory](xref:Apache.Arrow.Compression.CompressionCodecFactory) must be set in the + [IpcOptions](xref:Apache.Arrow.Ipc.IpcOptions). + Alternatively, a custom implementation of [ICompressionCodecFactory](xref:Apache.Arrow.Ipc.ICompressionCodecFactory) can be used. + +### Not Implemented + +- Serialization + - Exhaustive validation + - Run End Encoding +- Types + - Tensor +- Arrays + - Large Arrays. There are large array types provided to help with interoperability with other libraries, + but these do not support buffers larger than 2 GiB and an exception will be raised if trying to import an array that is too large. + - [Large Binary](xref:Apache.Arrow.Types.LargeBinaryType) + - [Large List](xref:Apache.Arrow.Types.LargeListType) + - [Large String](xref:Apache.Arrow.Types.LargeStringType) + - Views + - [Binary View](xref:Apache.Arrow.Types.BinaryViewType) + - [List View](xref:Apache.Arrow.Types.ListViewType) + - [String View](xref:Apache.Arrow.Types.StringViewType) +- Array Operations + - Equality / Comparison + - Casting +- Compute + - There is currently no API available for a compute / kernel abstraction. diff --git a/dev/release/rat_exclude_files.txt b/docs/toc.yml similarity index 83% copy from dev/release/rat_exclude_files.txt copy to docs/toc.yml index 97d498a..1f897e4 100644 --- a/dev/release/rat_exclude_files.txt +++ b/docs/toc.yml @@ -1,3 +1,5 @@ +### YamlMime:TableOfContent + # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information @@ -15,8 +17,9 @@ # specific language governing permissions and limitations # under the License. -*.csproj -*.resx -*.sln -.github/pull_request_template.md -src/Apache.Arrow/Flatbuf/* +items: +- name: API Reference + type: Namespace + href: api/ +- name: GitHub + href: https://github.com/apache/arrow-dotnet