lidavidm commented on a change in pull request #12739: URL: https://github.com/apache/arrow/pull/12739#discussion_r839557302
########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* + +**ValueVectors** represent a sequence of values of the same type. +They are also known as "arrays" in the columnar format. + +Example: create a vector of 32-bit integers representing ``[1, null, 2]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try( + BufferAllocator allocator = new RootAllocator(); + IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); + ){ + intVector.allocateNew(3); + intVector.set(0,1); + intVector.setNull(1); + intVector.set(2,2); + intVector.setValueCount(3); + System.out.println("Vector created in memory: " + intVector); + } + +.. code-block:: shell + + Vector created in memory: [1, null, 2] + + +Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VarCharVector; + + try( + BufferAllocator allocator = new RootAllocator(); + VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); + ){ + varCharVector.allocateNew(3); + varCharVector.set(0, "one".getBytes()); + varCharVector.set(1, "two".getBytes()); + varCharVector.set(2, "three".getBytes()); + varCharVector.setValueCount(3); + System.out.println("Vector created in memory: " + varCharVector); + } + +.. code-block:: shell + + Vector created in memory: [one, two, three] + +Create a Field +************** + +**Fields** are used to denote the particular columns of tabular data. +They consist of a name, a data type, a flag indicating whether the column can have null values, +and optional key-value metadata. + +Example: create a field named "document" of string type: + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import java.util.HashMap; + import java.util.Map; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("A", "Id card"); + metadata.put("B", "Passport"); + metadata.put("C", "Visa"); + Field document = new Field("document", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), + /*children*/ null); + System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); + +.. code-block:: shell + + Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} + +Create a Schema +*************** + +**Schema** holds a sequence of fields together with some optional metadata. Review comment: ```suggestion **Schemas** hold a sequence of fields together with some optional metadata. ``` ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in Review comment: We don't need to put any of these terms in a code font ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* + +**ValueVectors** represent a sequence of values of the same type. +They are also known as "arrays" in the columnar format. + +Example: create a vector of 32-bit integers representing ``[1, null, 2]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try( + BufferAllocator allocator = new RootAllocator(); + IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); + ){ + intVector.allocateNew(3); + intVector.set(0,1); + intVector.setNull(1); + intVector.set(2,2); + intVector.setValueCount(3); + System.out.println("Vector created in memory: " + intVector); + } + +.. code-block:: shell + + Vector created in memory: [1, null, 2] + + +Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VarCharVector; + + try( + BufferAllocator allocator = new RootAllocator(); + VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); + ){ + varCharVector.allocateNew(3); + varCharVector.set(0, "one".getBytes()); + varCharVector.set(1, "two".getBytes()); + varCharVector.set(2, "three".getBytes()); + varCharVector.setValueCount(3); + System.out.println("Vector created in memory: " + varCharVector); + } + +.. code-block:: shell + + Vector created in memory: [one, two, three] + +Create a Field +************** + +**Fields** are used to denote the particular columns of tabular data. +They consist of a name, a data type, a flag indicating whether the column can have null values, +and optional key-value metadata. + +Example: create a field named "document" of string type: + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import java.util.HashMap; + import java.util.Map; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("A", "Id card"); + metadata.put("B", "Passport"); + metadata.put("C", "Visa"); + Field document = new Field("document", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), + /*children*/ null); + System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); + +.. code-block:: shell + + Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} + +Create a Schema +*************** + +**Schema** holds a sequence of fields together with some optional metadata. + +Example: Create a schema describing datasets with two columns: +a int32 column "A" and a utf8-encoded string column "B" + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("K1", "V1"); + metadata.put("K2", "V2"); + Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null); + Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null); + Schema schema = new Schema(asList(a, b), metadata); + System.out.println("Schema created: " + schema); + +.. code-block:: shell + + Schema created: Schema<A: Int(32, true), B: Utf8>(metadata: {K1=V1, K2=V2}) + +Create a VectorSchemaRoot +************************* + +**VectorSchemaRoot** combines ValueVectors with a Schema to represent tabular data. + +Example: Create a dataset with metadata that contains integer age and +string names of data. Review comment: ```suggestion Example: Create a dataset of names (strings) and ages (32-bit signed integers). ``` ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* + +**ValueVectors** represent a sequence of values of the same type. +They are also known as "arrays" in the columnar format. + +Example: create a vector of 32-bit integers representing ``[1, null, 2]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try( + BufferAllocator allocator = new RootAllocator(); + IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); + ){ + intVector.allocateNew(3); + intVector.set(0,1); + intVector.setNull(1); + intVector.set(2,2); + intVector.setValueCount(3); + System.out.println("Vector created in memory: " + intVector); + } + +.. code-block:: shell + + Vector created in memory: [1, null, 2] + + +Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VarCharVector; + + try( + BufferAllocator allocator = new RootAllocator(); + VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); + ){ + varCharVector.allocateNew(3); + varCharVector.set(0, "one".getBytes()); + varCharVector.set(1, "two".getBytes()); + varCharVector.set(2, "three".getBytes()); + varCharVector.setValueCount(3); + System.out.println("Vector created in memory: " + varCharVector); + } + +.. code-block:: shell + + Vector created in memory: [one, two, three] + +Create a Field +************** + +**Fields** are used to denote the particular columns of tabular data. +They consist of a name, a data type, a flag indicating whether the column can have null values, +and optional key-value metadata. + +Example: create a field named "document" of string type: + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import java.util.HashMap; + import java.util.Map; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("A", "Id card"); + metadata.put("B", "Passport"); + metadata.put("C", "Visa"); + Field document = new Field("document", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), + /*children*/ null); + System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); + +.. code-block:: shell + + Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} + +Create a Schema +*************** + +**Schema** holds a sequence of fields together with some optional metadata. + +Example: Create a schema describing datasets with two columns: +a int32 column "A" and a utf8-encoded string column "B" + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("K1", "V1"); + metadata.put("K2", "V2"); + Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null); + Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null); + Schema schema = new Schema(asList(a, b), metadata); + System.out.println("Schema created: " + schema); + +.. code-block:: shell + + Schema created: Schema<A: Int(32, true), B: Utf8>(metadata: {K1=V1, K2=V2}) + +Create a VectorSchemaRoot +************************* + +**VectorSchemaRoot** combines ValueVectors with a Schema to represent tabular data. + +Example: Create a dataset with metadata that contains integer age and +string names of data. + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + import org.apache.arrow.vector.VarCharVector; + import org.apache.arrow.vector.VectorSchemaRoot; + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.nio.charset.StandardCharsets; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Field a = new Field("age", + FieldType.nullable(new ArrowType.Int(32, true)), + /*children*/null + ); + Field b = new Field("name", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, /*metadata*/ null), Review comment: Can we use `FieldType.nullable` here? ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* + +**ValueVectors** represent a sequence of values of the same type. +They are also known as "arrays" in the columnar format. + +Example: create a vector of 32-bit integers representing ``[1, null, 2]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try( + BufferAllocator allocator = new RootAllocator(); + IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); + ){ + intVector.allocateNew(3); + intVector.set(0,1); + intVector.setNull(1); + intVector.set(2,2); + intVector.setValueCount(3); + System.out.println("Vector created in memory: " + intVector); + } + +.. code-block:: shell + + Vector created in memory: [1, null, 2] + + +Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VarCharVector; + + try( + BufferAllocator allocator = new RootAllocator(); + VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); + ){ + varCharVector.allocateNew(3); + varCharVector.set(0, "one".getBytes()); + varCharVector.set(1, "two".getBytes()); + varCharVector.set(2, "three".getBytes()); + varCharVector.setValueCount(3); + System.out.println("Vector created in memory: " + varCharVector); + } + +.. code-block:: shell + + Vector created in memory: [one, two, three] + +Create a Field +************** + +**Fields** are used to denote the particular columns of tabular data. +They consist of a name, a data type, a flag indicating whether the column can have null values, +and optional key-value metadata. + +Example: create a field named "document" of string type: + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import java.util.HashMap; + import java.util.Map; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("A", "Id card"); + metadata.put("B", "Passport"); + metadata.put("C", "Visa"); + Field document = new Field("document", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), + /*children*/ null); + System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); + +.. code-block:: shell + + Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} + +Create a Schema +*************** + +**Schema** holds a sequence of fields together with some optional metadata. + +Example: Create a schema describing datasets with two columns: +a int32 column "A" and a utf8-encoded string column "B" + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("K1", "V1"); + metadata.put("K2", "V2"); + Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null); + Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null); + Schema schema = new Schema(asList(a, b), metadata); + System.out.println("Schema created: " + schema); + +.. code-block:: shell + + Schema created: Schema<A: Int(32, true), B: Utf8>(metadata: {K1=V1, K2=V2}) + +Create a VectorSchemaRoot +************************* + +**VectorSchemaRoot** combines ValueVectors with a Schema to represent tabular data. + +Example: Create a dataset with metadata that contains integer age and +string names of data. + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + import org.apache.arrow.vector.VarCharVector; + import org.apache.arrow.vector.VectorSchemaRoot; + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.nio.charset.StandardCharsets; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Field a = new Field("age", + FieldType.nullable(new ArrowType.Int(32, true)), + /*children*/null + ); + Field b = new Field("name", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, /*metadata*/ null), + /*children*/null + ); + Schema schema = new Schema(asList(a, b), /*metadata*/ null); + try( + BufferAllocator allocator = new RootAllocator(); + VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator); + IntVector intVectorA = (IntVector) root.getVector("age"); + VarCharVector varCharVectorB = (VarCharVector) root.getVector("name"); + ){ + root.setRowCount(3); + intVectorA.allocateNew(3); + intVectorA.set(0, 10); + intVectorA.set(1, 20); + intVectorA.set(2, 30); + varCharVectorB.allocateNew(3); + varCharVectorB.set(0, "Dave".getBytes(StandardCharsets.UTF_8)); + varCharVectorB.set(1, "Peter".getBytes(StandardCharsets.UTF_8)); + varCharVectorB.set(2, "Mary".getBytes(StandardCharsets.UTF_8)); + System.out.println("VectorSchemaRoot created: \n" + root.contentToTSVString()); + } + +.. code-block:: shell + + VectorSchemaRoot created: + age name + 10 Dave + 20 Peter + 30 Mary + + +Interprocess Communication (IPC) +******************************** + +Arrow data can be written to and read from disk, and both of these can be done in +a streaming and/or random-access fashion depending on application requirements. + +**Write data to an arrow file** + +Example: Write the dataset from the previous example to an Arrow random-access file. + +.. code-block:: Java + + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + import org.apache.arrow.vector.VarCharVector; + import org.apache.arrow.vector.VectorSchemaRoot; + import org.apache.arrow.vector.ipc.ArrowFileWriter; + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.io.File; + import java.io.FileOutputStream; + import java.io.IOException; + import java.nio.charset.StandardCharsets; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadataField = new HashMap<>(); + metadataField.put("K1-Field", "K1F1"); + metadataField.put("K2-Field", "K2F2"); Review comment: IMO, we can remove the metadata here to keep this example more manageable. ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* + +**ValueVectors** represent a sequence of values of the same type. +They are also known as "arrays" in the columnar format. + +Example: create a vector of 32-bit integers representing ``[1, null, 2]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try( + BufferAllocator allocator = new RootAllocator(); + IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); + ){ + intVector.allocateNew(3); + intVector.set(0,1); + intVector.setNull(1); + intVector.set(2,2); + intVector.setValueCount(3); + System.out.println("Vector created in memory: " + intVector); + } + +.. code-block:: shell + + Vector created in memory: [1, null, 2] + + +Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VarCharVector; + + try( + BufferAllocator allocator = new RootAllocator(); + VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); + ){ + varCharVector.allocateNew(3); + varCharVector.set(0, "one".getBytes()); + varCharVector.set(1, "two".getBytes()); + varCharVector.set(2, "three".getBytes()); + varCharVector.setValueCount(3); + System.out.println("Vector created in memory: " + varCharVector); + } + +.. code-block:: shell + + Vector created in memory: [one, two, three] + +Create a Field +************** + +**Fields** are used to denote the particular columns of tabular data. +They consist of a name, a data type, a flag indicating whether the column can have null values, +and optional key-value metadata. + +Example: create a field named "document" of string type: + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import java.util.HashMap; + import java.util.Map; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("A", "Id card"); + metadata.put("B", "Passport"); + metadata.put("C", "Visa"); + Field document = new Field("document", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), + /*children*/ null); + System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); + +.. code-block:: shell + + Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} + +Create a Schema +*************** + +**Schema** holds a sequence of fields together with some optional metadata. + +Example: Create a schema describing datasets with two columns: +a int32 column "A" and a utf8-encoded string column "B" + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("K1", "V1"); + metadata.put("K2", "V2"); + Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null); + Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null); + Schema schema = new Schema(asList(a, b), metadata); + System.out.println("Schema created: " + schema); + +.. code-block:: shell + + Schema created: Schema<A: Int(32, true), B: Utf8>(metadata: {K1=V1, K2=V2}) + +Create a VectorSchemaRoot +************************* + +**VectorSchemaRoot** combines ValueVectors with a Schema to represent tabular data. + +Example: Create a dataset with metadata that contains integer age and +string names of data. + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + import org.apache.arrow.vector.VarCharVector; + import org.apache.arrow.vector.VectorSchemaRoot; + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.nio.charset.StandardCharsets; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Field a = new Field("age", + FieldType.nullable(new ArrowType.Int(32, true)), + /*children*/null + ); + Field b = new Field("name", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, /*metadata*/ null), + /*children*/null + ); + Schema schema = new Schema(asList(a, b), /*metadata*/ null); + try( + BufferAllocator allocator = new RootAllocator(); + VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator); + IntVector intVectorA = (IntVector) root.getVector("age"); + VarCharVector varCharVectorB = (VarCharVector) root.getVector("name"); + ){ + root.setRowCount(3); + intVectorA.allocateNew(3); + intVectorA.set(0, 10); + intVectorA.set(1, 20); + intVectorA.set(2, 30); + varCharVectorB.allocateNew(3); + varCharVectorB.set(0, "Dave".getBytes(StandardCharsets.UTF_8)); + varCharVectorB.set(1, "Peter".getBytes(StandardCharsets.UTF_8)); + varCharVectorB.set(2, "Mary".getBytes(StandardCharsets.UTF_8)); + System.out.println("VectorSchemaRoot created: \n" + root.contentToTSVString()); + } + +.. code-block:: shell + + VectorSchemaRoot created: + age name + 10 Dave + 20 Peter + 30 Mary + + +Interprocess Communication (IPC) +******************************** + +Arrow data can be written to and read from disk, and both of these can be done in +a streaming and/or random-access fashion depending on application requirements. + +**Write data to an arrow file** + +Example: Write the dataset from the previous example to an Arrow random-access file. + +.. code-block:: Java + + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + import org.apache.arrow.vector.VarCharVector; + import org.apache.arrow.vector.VectorSchemaRoot; + import org.apache.arrow.vector.ipc.ArrowFileWriter; + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.io.File; + import java.io.FileOutputStream; + import java.io.IOException; + import java.nio.charset.StandardCharsets; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadataField = new HashMap<>(); + metadataField.put("K1-Field", "K1F1"); + metadataField.put("K2-Field", "K2F2"); + Field a = new Field("Column-A-Age", Review comment: ```suggestion Field age = new Field("age", ``` ########## File path: docs/source/java/getting_started.rst ########## @@ -0,0 +1,33 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +User Guide +========== + +.. toctree:: Review comment: This still doesn't seem right - keep the toctree in index.rst, and move the contents of "quickstartguide.rst" into this file. ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* Review comment: ```suggestion Create a ValueVector ******************** ``` ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* + +**ValueVectors** represent a sequence of values of the same type. +They are also known as "arrays" in the columnar format. + +Example: create a vector of 32-bit integers representing ``[1, null, 2]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try( + BufferAllocator allocator = new RootAllocator(); + IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); + ){ + intVector.allocateNew(3); + intVector.set(0,1); + intVector.setNull(1); + intVector.set(2,2); + intVector.setValueCount(3); + System.out.println("Vector created in memory: " + intVector); + } + +.. code-block:: shell + + Vector created in memory: [1, null, 2] + + +Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VarCharVector; + + try( + BufferAllocator allocator = new RootAllocator(); + VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); + ){ + varCharVector.allocateNew(3); + varCharVector.set(0, "one".getBytes()); + varCharVector.set(1, "two".getBytes()); + varCharVector.set(2, "three".getBytes()); + varCharVector.setValueCount(3); + System.out.println("Vector created in memory: " + varCharVector); + } + +.. code-block:: shell + + Vector created in memory: [one, two, three] + +Create a Field +************** + +**Fields** are used to denote the particular columns of tabular data. +They consist of a name, a data type, a flag indicating whether the column can have null values, +and optional key-value metadata. + +Example: create a field named "document" of string type: + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import java.util.HashMap; + import java.util.Map; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("A", "Id card"); + metadata.put("B", "Passport"); + metadata.put("C", "Visa"); + Field document = new Field("document", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), + /*children*/ null); + System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); + +.. code-block:: shell + + Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} + +Create a Schema +*************** + +**Schema** holds a sequence of fields together with some optional metadata. + +Example: Create a schema describing datasets with two columns: +a int32 column "A" and a utf8-encoded string column "B" Review comment: ```suggestion an int32 column "A" and a UTF8-encoded string column "B" ``` ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* + +**ValueVectors** represent a sequence of values of the same type. +They are also known as "arrays" in the columnar format. + +Example: create a vector of 32-bit integers representing ``[1, null, 2]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try( + BufferAllocator allocator = new RootAllocator(); + IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); + ){ + intVector.allocateNew(3); + intVector.set(0,1); + intVector.setNull(1); + intVector.set(2,2); + intVector.setValueCount(3); + System.out.println("Vector created in memory: " + intVector); + } + +.. code-block:: shell + + Vector created in memory: [1, null, 2] + + +Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VarCharVector; + + try( + BufferAllocator allocator = new RootAllocator(); + VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); + ){ + varCharVector.allocateNew(3); + varCharVector.set(0, "one".getBytes()); + varCharVector.set(1, "two".getBytes()); + varCharVector.set(2, "three".getBytes()); + varCharVector.setValueCount(3); + System.out.println("Vector created in memory: " + varCharVector); + } + +.. code-block:: shell + + Vector created in memory: [one, two, three] + +Create a Field +************** + +**Fields** are used to denote the particular columns of tabular data. +They consist of a name, a data type, a flag indicating whether the column can have null values, +and optional key-value metadata. + +Example: create a field named "document" of string type: + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import java.util.HashMap; + import java.util.Map; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("A", "Id card"); + metadata.put("B", "Passport"); + metadata.put("C", "Visa"); + Field document = new Field("document", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), + /*children*/ null); + System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); + +.. code-block:: shell + + Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} + +Create a Schema +*************** + +**Schema** holds a sequence of fields together with some optional metadata. + +Example: Create a schema describing datasets with two columns: +a int32 column "A" and a utf8-encoded string column "B" + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("K1", "V1"); + metadata.put("K2", "V2"); + Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null); + Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null); + Schema schema = new Schema(asList(a, b), metadata); + System.out.println("Schema created: " + schema); + +.. code-block:: shell + + Schema created: Schema<A: Int(32, true), B: Utf8>(metadata: {K1=V1, K2=V2}) + +Create a VectorSchemaRoot +************************* + +**VectorSchemaRoot** combines ValueVectors with a Schema to represent tabular data. + +Example: Create a dataset with metadata that contains integer age and +string names of data. + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + import org.apache.arrow.vector.VarCharVector; + import org.apache.arrow.vector.VectorSchemaRoot; + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.nio.charset.StandardCharsets; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Field a = new Field("age", Review comment: ```suggestion Field age = new Field("age", ``` ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* + +**ValueVectors** represent a sequence of values of the same type. +They are also known as "arrays" in the columnar format. + +Example: create a vector of 32-bit integers representing ``[1, null, 2]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try( + BufferAllocator allocator = new RootAllocator(); + IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); + ){ + intVector.allocateNew(3); + intVector.set(0,1); + intVector.setNull(1); + intVector.set(2,2); + intVector.setValueCount(3); + System.out.println("Vector created in memory: " + intVector); + } + +.. code-block:: shell + + Vector created in memory: [1, null, 2] + + +Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VarCharVector; + + try( + BufferAllocator allocator = new RootAllocator(); + VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); + ){ + varCharVector.allocateNew(3); + varCharVector.set(0, "one".getBytes()); + varCharVector.set(1, "two".getBytes()); + varCharVector.set(2, "three".getBytes()); + varCharVector.setValueCount(3); + System.out.println("Vector created in memory: " + varCharVector); + } + +.. code-block:: shell + + Vector created in memory: [one, two, three] + +Create a Field +************** + +**Fields** are used to denote the particular columns of tabular data. +They consist of a name, a data type, a flag indicating whether the column can have null values, +and optional key-value metadata. + +Example: create a field named "document" of string type: + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import java.util.HashMap; + import java.util.Map; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("A", "Id card"); + metadata.put("B", "Passport"); + metadata.put("C", "Visa"); + Field document = new Field("document", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), + /*children*/ null); + System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); + +.. code-block:: shell + + Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} + +Create a Schema +*************** + +**Schema** holds a sequence of fields together with some optional metadata. + +Example: Create a schema describing datasets with two columns: +a int32 column "A" and a utf8-encoded string column "B" + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("K1", "V1"); + metadata.put("K2", "V2"); + Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null); + Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null); + Schema schema = new Schema(asList(a, b), metadata); + System.out.println("Schema created: " + schema); + +.. code-block:: shell + + Schema created: Schema<A: Int(32, true), B: Utf8>(metadata: {K1=V1, K2=V2}) + +Create a VectorSchemaRoot +************************* + +**VectorSchemaRoot** combines ValueVectors with a Schema to represent tabular data. Review comment: ```suggestion A **VectorSchemaRoot** combines ValueVectors with a Schema to represent tabular data. ``` ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* + +**ValueVectors** represent a sequence of values of the same type. +They are also known as "arrays" in the columnar format. + +Example: create a vector of 32-bit integers representing ``[1, null, 2]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try( + BufferAllocator allocator = new RootAllocator(); + IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); + ){ + intVector.allocateNew(3); + intVector.set(0,1); + intVector.setNull(1); + intVector.set(2,2); + intVector.setValueCount(3); + System.out.println("Vector created in memory: " + intVector); + } + +.. code-block:: shell + + Vector created in memory: [1, null, 2] + + +Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VarCharVector; + + try( + BufferAllocator allocator = new RootAllocator(); + VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); + ){ + varCharVector.allocateNew(3); + varCharVector.set(0, "one".getBytes()); + varCharVector.set(1, "two".getBytes()); + varCharVector.set(2, "three".getBytes()); + varCharVector.setValueCount(3); + System.out.println("Vector created in memory: " + varCharVector); + } + +.. code-block:: shell + + Vector created in memory: [one, two, three] + +Create a Field +************** + +**Fields** are used to denote the particular columns of tabular data. +They consist of a name, a data type, a flag indicating whether the column can have null values, +and optional key-value metadata. + +Example: create a field named "document" of string type: + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import java.util.HashMap; + import java.util.Map; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("A", "Id card"); + metadata.put("B", "Passport"); + metadata.put("C", "Visa"); + Field document = new Field("document", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), + /*children*/ null); + System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); + +.. code-block:: shell + + Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} + +Create a Schema +*************** + +**Schema** holds a sequence of fields together with some optional metadata. + +Example: Create a schema describing datasets with two columns: +a int32 column "A" and a utf8-encoded string column "B" + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("K1", "V1"); + metadata.put("K2", "V2"); + Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null); + Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null); + Schema schema = new Schema(asList(a, b), metadata); + System.out.println("Schema created: " + schema); + +.. code-block:: shell + + Schema created: Schema<A: Int(32, true), B: Utf8>(metadata: {K1=V1, K2=V2}) + +Create a VectorSchemaRoot +************************* + +**VectorSchemaRoot** combines ValueVectors with a Schema to represent tabular data. + +Example: Create a dataset with metadata that contains integer age and +string names of data. + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + import org.apache.arrow.vector.VarCharVector; + import org.apache.arrow.vector.VectorSchemaRoot; + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.nio.charset.StandardCharsets; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Field a = new Field("age", + FieldType.nullable(new ArrowType.Int(32, true)), + /*children*/null + ); + Field b = new Field("name", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, /*metadata*/ null), + /*children*/null + ); + Schema schema = new Schema(asList(a, b), /*metadata*/ null); + try( + BufferAllocator allocator = new RootAllocator(); + VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator); + IntVector intVectorA = (IntVector) root.getVector("age"); + VarCharVector varCharVectorB = (VarCharVector) root.getVector("name"); Review comment: ```suggestion IntVector ageVector = (IntVector) root.getVector("age"); VarCharVector nameVector = (VarCharVector) root.getVector("name"); ``` ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* + +**ValueVectors** represent a sequence of values of the same type. +They are also known as "arrays" in the columnar format. + +Example: create a vector of 32-bit integers representing ``[1, null, 2]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try( + BufferAllocator allocator = new RootAllocator(); + IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); + ){ + intVector.allocateNew(3); + intVector.set(0,1); + intVector.setNull(1); + intVector.set(2,2); + intVector.setValueCount(3); + System.out.println("Vector created in memory: " + intVector); + } + +.. code-block:: shell + + Vector created in memory: [1, null, 2] + + +Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VarCharVector; + + try( + BufferAllocator allocator = new RootAllocator(); + VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); + ){ + varCharVector.allocateNew(3); + varCharVector.set(0, "one".getBytes()); + varCharVector.set(1, "two".getBytes()); + varCharVector.set(2, "three".getBytes()); + varCharVector.setValueCount(3); + System.out.println("Vector created in memory: " + varCharVector); + } + +.. code-block:: shell + + Vector created in memory: [one, two, three] + +Create a Field +************** + +**Fields** are used to denote the particular columns of tabular data. +They consist of a name, a data type, a flag indicating whether the column can have null values, +and optional key-value metadata. + +Example: create a field named "document" of string type: + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import java.util.HashMap; + import java.util.Map; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("A", "Id card"); + metadata.put("B", "Passport"); + metadata.put("C", "Visa"); + Field document = new Field("document", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), + /*children*/ null); + System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); + +.. code-block:: shell + + Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} + +Create a Schema +*************** + +**Schema** holds a sequence of fields together with some optional metadata. + +Example: Create a schema describing datasets with two columns: +a int32 column "A" and a utf8-encoded string column "B" + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("K1", "V1"); + metadata.put("K2", "V2"); + Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null); + Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null); + Schema schema = new Schema(asList(a, b), metadata); + System.out.println("Schema created: " + schema); + +.. code-block:: shell + + Schema created: Schema<A: Int(32, true), B: Utf8>(metadata: {K1=V1, K2=V2}) + +Create a VectorSchemaRoot +************************* + +**VectorSchemaRoot** combines ValueVectors with a Schema to represent tabular data. + +Example: Create a dataset with metadata that contains integer age and +string names of data. + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + import org.apache.arrow.vector.VarCharVector; + import org.apache.arrow.vector.VectorSchemaRoot; + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.nio.charset.StandardCharsets; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Field a = new Field("age", + FieldType.nullable(new ArrowType.Int(32, true)), + /*children*/null + ); + Field b = new Field("name", Review comment: ```suggestion Field name = new Field("name", ``` ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* + +**ValueVectors** represent a sequence of values of the same type. +They are also known as "arrays" in the columnar format. + +Example: create a vector of 32-bit integers representing ``[1, null, 2]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try( + BufferAllocator allocator = new RootAllocator(); + IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); + ){ + intVector.allocateNew(3); + intVector.set(0,1); + intVector.setNull(1); + intVector.set(2,2); + intVector.setValueCount(3); + System.out.println("Vector created in memory: " + intVector); + } + +.. code-block:: shell + + Vector created in memory: [1, null, 2] + + +Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VarCharVector; + + try( + BufferAllocator allocator = new RootAllocator(); + VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); + ){ + varCharVector.allocateNew(3); + varCharVector.set(0, "one".getBytes()); + varCharVector.set(1, "two".getBytes()); + varCharVector.set(2, "three".getBytes()); + varCharVector.setValueCount(3); + System.out.println("Vector created in memory: " + varCharVector); + } + +.. code-block:: shell + + Vector created in memory: [one, two, three] + +Create a Field +************** + +**Fields** are used to denote the particular columns of tabular data. +They consist of a name, a data type, a flag indicating whether the column can have null values, +and optional key-value metadata. + +Example: create a field named "document" of string type: + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import java.util.HashMap; + import java.util.Map; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("A", "Id card"); + metadata.put("B", "Passport"); + metadata.put("C", "Visa"); + Field document = new Field("document", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), + /*children*/ null); + System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); + +.. code-block:: shell + + Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} + +Create a Schema +*************** + +**Schema** holds a sequence of fields together with some optional metadata. + +Example: Create a schema describing datasets with two columns: +a int32 column "A" and a utf8-encoded string column "B" + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("K1", "V1"); + metadata.put("K2", "V2"); + Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null); + Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null); + Schema schema = new Schema(asList(a, b), metadata); + System.out.println("Schema created: " + schema); + +.. code-block:: shell + + Schema created: Schema<A: Int(32, true), B: Utf8>(metadata: {K1=V1, K2=V2}) + +Create a VectorSchemaRoot +************************* + +**VectorSchemaRoot** combines ValueVectors with a Schema to represent tabular data. + +Example: Create a dataset with metadata that contains integer age and +string names of data. + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + import org.apache.arrow.vector.VarCharVector; + import org.apache.arrow.vector.VectorSchemaRoot; + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.nio.charset.StandardCharsets; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Field a = new Field("age", + FieldType.nullable(new ArrowType.Int(32, true)), + /*children*/null Review comment: Does Field require you to pass children? ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* + +**ValueVectors** represent a sequence of values of the same type. +They are also known as "arrays" in the columnar format. + +Example: create a vector of 32-bit integers representing ``[1, null, 2]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try( + BufferAllocator allocator = new RootAllocator(); + IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); + ){ + intVector.allocateNew(3); + intVector.set(0,1); + intVector.setNull(1); + intVector.set(2,2); + intVector.setValueCount(3); + System.out.println("Vector created in memory: " + intVector); + } + +.. code-block:: shell + + Vector created in memory: [1, null, 2] + + +Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VarCharVector; + + try( + BufferAllocator allocator = new RootAllocator(); + VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); + ){ + varCharVector.allocateNew(3); + varCharVector.set(0, "one".getBytes()); + varCharVector.set(1, "two".getBytes()); + varCharVector.set(2, "three".getBytes()); + varCharVector.setValueCount(3); + System.out.println("Vector created in memory: " + varCharVector); + } + +.. code-block:: shell + + Vector created in memory: [one, two, three] + +Create a Field +************** + +**Fields** are used to denote the particular columns of tabular data. +They consist of a name, a data type, a flag indicating whether the column can have null values, +and optional key-value metadata. + +Example: create a field named "document" of string type: + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import java.util.HashMap; + import java.util.Map; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("A", "Id card"); + metadata.put("B", "Passport"); + metadata.put("C", "Visa"); + Field document = new Field("document", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), + /*children*/ null); + System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); + +.. code-block:: shell + + Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} + +Create a Schema +*************** + +**Schema** holds a sequence of fields together with some optional metadata. + +Example: Create a schema describing datasets with two columns: +a int32 column "A" and a utf8-encoded string column "B" + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("K1", "V1"); + metadata.put("K2", "V2"); + Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null); + Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null); + Schema schema = new Schema(asList(a, b), metadata); + System.out.println("Schema created: " + schema); + +.. code-block:: shell + + Schema created: Schema<A: Int(32, true), B: Utf8>(metadata: {K1=V1, K2=V2}) + +Create a VectorSchemaRoot +************************* + +**VectorSchemaRoot** combines ValueVectors with a Schema to represent tabular data. + +Example: Create a dataset with metadata that contains integer age and +string names of data. + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + import org.apache.arrow.vector.VarCharVector; + import org.apache.arrow.vector.VectorSchemaRoot; + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.nio.charset.StandardCharsets; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Field a = new Field("age", + FieldType.nullable(new ArrowType.Int(32, true)), + /*children*/null + ); + Field b = new Field("name", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, /*metadata*/ null), + /*children*/null + ); + Schema schema = new Schema(asList(a, b), /*metadata*/ null); + try( + BufferAllocator allocator = new RootAllocator(); + VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator); + IntVector intVectorA = (IntVector) root.getVector("age"); + VarCharVector varCharVectorB = (VarCharVector) root.getVector("name"); + ){ + root.setRowCount(3); + intVectorA.allocateNew(3); + intVectorA.set(0, 10); + intVectorA.set(1, 20); + intVectorA.set(2, 30); + varCharVectorB.allocateNew(3); + varCharVectorB.set(0, "Dave".getBytes(StandardCharsets.UTF_8)); + varCharVectorB.set(1, "Peter".getBytes(StandardCharsets.UTF_8)); + varCharVectorB.set(2, "Mary".getBytes(StandardCharsets.UTF_8)); + System.out.println("VectorSchemaRoot created: \n" + root.contentToTSVString()); + } + +.. code-block:: shell + + VectorSchemaRoot created: + age name + 10 Dave + 20 Peter + 30 Mary + + +Interprocess Communication (IPC) +******************************** + +Arrow data can be written to and read from disk, and both of these can be done in +a streaming and/or random-access fashion depending on application requirements. + +**Write data to an arrow file** + +Example: Write the dataset from the previous example to an Arrow random-access file. + +.. code-block:: Java + + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + import org.apache.arrow.vector.VarCharVector; + import org.apache.arrow.vector.VectorSchemaRoot; + import org.apache.arrow.vector.ipc.ArrowFileWriter; + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.io.File; + import java.io.FileOutputStream; + import java.io.IOException; + import java.nio.charset.StandardCharsets; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadataField = new HashMap<>(); + metadataField.put("K1-Field", "K1F1"); + metadataField.put("K2-Field", "K2F2"); + Field a = new Field("Column-A-Age", Review comment: and similar below ########## File path: docs/source/java/quickstartguide.rst ########## @@ -0,0 +1,325 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: java +.. highlight:: java + +================= +Quick Start Guide +================= + +.. contents:: + +Arrow Java provides several building blocks. Data types describe the types of values; +``ValueVectors`` are sequences of typed values; ``fields`` describe the types of columns in +tabular data; ``schemas`` describe a sequence of columns in tabular data, and +``VectorSchemaRoot`` represents tabular data. Arrow also provides ``readers`` and +``writers`` for loading data from and persisting data to storage. + +Create a ValueVector +********************* + +**ValueVectors** represent a sequence of values of the same type. +They are also known as "arrays" in the columnar format. + +Example: create a vector of 32-bit integers representing ``[1, null, 2]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try( + BufferAllocator allocator = new RootAllocator(); + IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); + ){ + intVector.allocateNew(3); + intVector.set(0,1); + intVector.setNull(1); + intVector.set(2,2); + intVector.setValueCount(3); + System.out.println("Vector created in memory: " + intVector); + } + +.. code-block:: shell + + Vector created in memory: [1, null, 2] + + +Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VarCharVector; + + try( + BufferAllocator allocator = new RootAllocator(); + VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); + ){ + varCharVector.allocateNew(3); + varCharVector.set(0, "one".getBytes()); + varCharVector.set(1, "two".getBytes()); + varCharVector.set(2, "three".getBytes()); + varCharVector.setValueCount(3); + System.out.println("Vector created in memory: " + varCharVector); + } + +.. code-block:: shell + + Vector created in memory: [one, two, three] + +Create a Field +************** + +**Fields** are used to denote the particular columns of tabular data. +They consist of a name, a data type, a flag indicating whether the column can have null values, +and optional key-value metadata. + +Example: create a field named "document" of string type: + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import java.util.HashMap; + import java.util.Map; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("A", "Id card"); + metadata.put("B", "Passport"); + metadata.put("C", "Visa"); + Field document = new Field("document", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), + /*children*/ null); + System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); + +.. code-block:: shell + + Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} + +Create a Schema +*************** + +**Schema** holds a sequence of fields together with some optional metadata. + +Example: Create a schema describing datasets with two columns: +a int32 column "A" and a utf8-encoded string column "B" + +.. code-block:: Java + + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Map<String, String> metadata = new HashMap<>(); + metadata.put("K1", "V1"); + metadata.put("K2", "V2"); + Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null); + Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null); + Schema schema = new Schema(asList(a, b), metadata); + System.out.println("Schema created: " + schema); + +.. code-block:: shell + + Schema created: Schema<A: Int(32, true), B: Utf8>(metadata: {K1=V1, K2=V2}) + +Create a VectorSchemaRoot +************************* + +**VectorSchemaRoot** combines ValueVectors with a Schema to represent tabular data. + +Example: Create a dataset with metadata that contains integer age and +string names of data. + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + import org.apache.arrow.vector.VarCharVector; + import org.apache.arrow.vector.VectorSchemaRoot; + import org.apache.arrow.vector.types.pojo.ArrowType; + import org.apache.arrow.vector.types.pojo.Field; + import org.apache.arrow.vector.types.pojo.FieldType; + import org.apache.arrow.vector.types.pojo.Schema; + import java.nio.charset.StandardCharsets; + import java.util.HashMap; + import java.util.Map; + import static java.util.Arrays.asList; + + Field a = new Field("age", + FieldType.nullable(new ArrowType.Int(32, true)), + /*children*/null + ); + Field b = new Field("name", + new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, /*metadata*/ null), + /*children*/null + ); + Schema schema = new Schema(asList(a, b), /*metadata*/ null); + try( + BufferAllocator allocator = new RootAllocator(); + VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator); + IntVector intVectorA = (IntVector) root.getVector("age"); + VarCharVector varCharVectorB = (VarCharVector) root.getVector("name"); + ){ + root.setRowCount(3); + intVectorA.allocateNew(3); + intVectorA.set(0, 10); + intVectorA.set(1, 20); + intVectorA.set(2, 30); + varCharVectorB.allocateNew(3); + varCharVectorB.set(0, "Dave".getBytes(StandardCharsets.UTF_8)); + varCharVectorB.set(1, "Peter".getBytes(StandardCharsets.UTF_8)); + varCharVectorB.set(2, "Mary".getBytes(StandardCharsets.UTF_8)); + System.out.println("VectorSchemaRoot created: \n" + root.contentToTSVString()); + } + +.. code-block:: shell + + VectorSchemaRoot created: + age name + 10 Dave + 20 Peter + 30 Mary + + +Interprocess Communication (IPC) +******************************** + +Arrow data can be written to and read from disk, and both of these can be done in +a streaming and/or random-access fashion depending on application requirements. + +**Write data to an arrow file** + +Example: Write the dataset from the previous example to an Arrow random-access file. + +.. code-block:: Java + + Review comment: ```suggestion ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
