This is an automated email from the ASF dual-hosted git repository. agoncharuk pushed a commit to branch ignite-13618 in repository https://gitbox.apache.org/repos/asf/ignite-3.git
The following commit(s) were added to refs/heads/ignite-13618 by this push: new 021f826 IGNITE-13618 Corrected a few checks for bytecode module, moved pieces of IEP-54 to module README.md 021f826 is described below commit 021f82695b4faa722133416599510bd071d75df7 Author: Alexey Goncharuk <alexey.goncha...@gmail.com> AuthorDate: Thu Mar 11 20:48:21 2021 +0300 IGNITE-13618 Corrected a few checks for bytecode module, moved pieces of IEP-54 to module README.md --- modules/bytecode/README.md | 4 +- .../facebook/presto/bytecode/MethodDefinition.java | 8 +- .../presto/bytecode/MethodGenerationContext.java | 2 +- modules/schema/README.md | 50 ++++++++++++- .../org/apache/ignite/internal/schema/README.md | 87 ++++++++++++++++++++++ .../ignite/internal/schema/package-info.java | 46 ------------ 6 files changed, 139 insertions(+), 58 deletions(-) diff --git a/modules/bytecode/README.md b/modules/bytecode/README.md index 0135e01..3a178c8 100644 --- a/modules/bytecode/README.md +++ b/modules/bytecode/README.md @@ -1,4 +1,6 @@ # Apache Ignite Bytecode module -Fork of PrestoDB Bytecode module (ver 0.243). +Fork of [PrestoDB Bytecode module (ver 0.243)](https://github.com/prestodb/presto/tree/0.243/presto-bytecode). * Removed unnecessary guava dependency. * Tests migrated from TestNG to JUnit 5. + +This module provides a convenient thin wrapper around [ASM](https://asm.ow2.io/) library to generate classes at runtime. \ No newline at end of file diff --git a/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodDefinition.java b/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodDefinition.java index 0297d9b..1405765 100644 --- a/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodDefinition.java +++ b/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodDefinition.java @@ -11,6 +11,7 @@ * See the License for the specific language governing permissions and * limitations under the License. */ + package com.facebook.presto.bytecode; import java.util.ArrayList; @@ -67,12 +68,7 @@ public class MethodDefinition { this.access = access; this.name = name; - if (returnType != null) { - this.returnType = returnType; - } - else { - this.returnType = type(void.class); - } + this.returnType = returnType != null ? returnType : type(void.class); this.parameters = List.copyOf(parameters); this.parameterTypes = parameters.stream().map(Parameter::getType).collect(Collectors.toList()); this.parameterAnnotations = parameters.stream().map(p -> new ArrayList<AnnotationDefinition>()).collect(Collectors.toList()); diff --git a/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodGenerationContext.java b/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodGenerationContext.java index ec2a65a..e62da9d 100644 --- a/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodGenerationContext.java +++ b/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodGenerationContext.java @@ -91,7 +91,7 @@ public class MethodGenerationContext { return true; } - private final class ScopeContext { + private static final class ScopeContext { private final Scope scope; private final List<Variable> variables; diff --git a/modules/schema/README.md b/modules/schema/README.md index e742dc5..e640252 100644 --- a/modules/schema/README.md +++ b/modules/schema/README.md @@ -1,7 +1,49 @@ # Schema module -This module provides implementations for schema configuration API and schema management components. +This module provides API and implementation for schema management components: -* Schema configuration public API implementation. -* Distributed schema management for processing schema change events at runtime. -* Schema version management for transparent upgrade stored data purposes according to life-schema concept. \ No newline at end of file +* Public API for schema definition and evolution +* Schema manager component that implements necessary machinary to translate schema management commands to corresponding + metastorage modifications, as well as schema modification event processing logic +* Necessary logic to build and upgrade tuples - rows of specific schema that encode user data in schema-defined format. + +## Schema-aware tables +We require that at any moment in time an Ignite table has only one most recent relevant schema. Upon schema +modification, we assign a monotonically growing identifier to each version of the cache schema. The ordering guarantees +are provided by the underlying distributed metastorage. The history of schema versions must be kept in the metastorage +for a long enough period of time to allow upgrade of all existing data stored in a given table. + +Given a schema evolution history, a tuple migration from version `N-k` to version `N` is a straightforward operation. +We identify fields that were dropped during the last k schema operations and fields that were added (taking into account +default field values) and update the tuple based on the field modifications. Afterward, the updated tuple is written in +the schema version `N` layout format. The tuple upgrade may happen on read with an optional writeback or on next update. +Additionally, tuple upgrade in background is possible. + +Since the tuple key hashcode is inlined to the tuple data for quick key lookups, we require that the set of key columns +do not change during the schema evolution. In the future, we may remove this restriction, but this will require careful +hashcode calculation adjustments. Removing a column from the key columns does not seem to be possible since it may +produce duplicates, and we assume PK has no duplicates. + +Additionally to adding and removing columns, it may be possible to allow for column type migrations when the type change +is non-ambiguous (a type upcast, e.g. Int8 → Int16, or by means of a certain expression, e,g, Int8 → String using +the `CAST` expression). + +### Dynamic schema expansion (live schema) +Ignite can operate in two modes that provide different flexibility level and restrictions wrt object-to-schema mapping: + * Strict mode. When a user attempts to insert/update an object to a table, Ignite checks that the object does not + contain any extra columns that are not present in the current table schema. If such columns are detected, Ignite will + fail the operation requiring the user to manually update the schema before working with added columns. + * Live mode. When an object is inserted into a table, we attempt to 'fit' object fields to the schema columns. If the + object has some extra fields which are not present in the current schema, the schema is automatically updated to store + additional extra fields that are present in the object. If there are two concurrent live schema modifications, they can + either merge together if modifications are non-conflicting (e.g. adding disjoint sets of columns or adding columns with + the same definition), or one of the modifications will fail (e.g. two columns with the same name, but conflicting type + are being inserted). Live schema will try to automatically expand the schema even if there was an explicit drop column + command executed right before the live schema expansion. **Live schema never drops columns during automatic schema + evolution.** If a schema has columns that were not fulfilled by object fields, they will be either kept `null` or + populated with defaults when provided, or the update will fail with an exception. + +### Data Layout +Data layout is documentation can be found [here](src/main/java/org/apache/ignite/internal/schema/README.md) + +## Object-to-schema mapping diff --git a/modules/schema/src/main/java/org/apache/ignite/internal/schema/README.md b/modules/schema/src/main/java/org/apache/ignite/internal/schema/README.md new file mode 100644 index 0000000..435c7be --- /dev/null +++ b/modules/schema/src/main/java/org/apache/ignite/internal/schema/README.md @@ -0,0 +1,87 @@ +This package provides necessary infrastructure to create, read, convert to and from POJO classes +schema-defined tuples. + +### Schema definition + +Schema is defined as a set of columns which are split into key columns chunk and value columns chunk. +Each column defined by a name, nullability flag, and a `org.apache.ignite.internal.schema.NativeType`. +Type is a thin wrapper over the `org.apache.ignite.internal.schema.NativeTypeSpec` to provide differentiation +between types of one kind with different size (an example of such differentiation is bitmask(n) or number(n)). +`org.apache.ignite.internal.schema.NativeTypeSpec` provides necessary indirection to read a column as a +`java.lang.Object` without needing to switch over the column type. + +`NativeType` defines one of the following types: + +Type | Size | Description +---- | ---- | ----------- +Bitmask(n)|⌈n/8⌉ bytes|A fixed-length bitmask of n bits +Int8|1 byte|1-byte signed integer +Uint8|1 byte|1-byte unsigned integer +Int16|2 bytes|2-byte signed integer +Uint16|2 bytes|2-byte unsigned integer +Int32|4 bytes|4-byte signed integer +Uint32|4 bytes|4-byte unsigned integer +Int64|8 bytes|8-byte signed integer +Uint64|8 bytes|8-byte unsigned integer +Float|4 bytes|4-byte floating-point number +Double|8 bytes|8-byte floating-point number +Number([n])|Variable|Variable-length number (optionally bound by n bytes in size) +Decimal|Variable|Variable-length floating-point number +UUID|16 bytes|UUID +String|Variable|A string encoded with a given Charset +Date|3 bytes|A timezone-free date encoded as a year (15 bits), month (4 bits), day (5 bits) +Time|4 bytes|A timezone-free time encoded as padding (5 bits), hour (5 bits), minute (6 bits), second (6 bits), millisecond (10 bits) +Datetime|7 bytes|A timezone-free datetime encoded as (date, time) +Timestamp|8 bytes|Number of milliseconds since Jan 1, 1970 00:00:00.000 (with no timezone) +Binary|Variable|Variable-size byte array + +Arbitrary nested object serialization at this point is not supported, but can be provided in the future by either +explicit inlining, or by providing an upper-level serialization primitive that will be mapped to a `Binary` column. + +### Tuple layout +A tuple itself does not contain any type metadata and only contains necessary information required for fast column +lookup. In a tuple, key columns and value columns are separated and written to chunks with identical structure +(so that chunk is self-sufficient, and, provided with the column types can be read independently). + +Tuple structure has the following format: + + ┌─────────────────────────────┬─────────────────────┐ + │ Header │ Data │ + ├─────────┬─────────┬─────────┼──────────┬──────────┤ + │ Schema │ Flags │ Key │ Key │ Value │ + │ Version │ │ Hash │ Chunk │ Chunk │ + ├─────────┼─────────┼─────────┼──────────┼──────────┤ + │ 2 Bytes │ 2 Bytes │ 4 Bytes │ Variable │ Variable │ + └─────────┴─────────┴─────────┴──────────┴──────────┘ + + +Each chunk section has the following structure: + + ┌──────────────────────────────────────────────────┐ + │ │ + ┌─────────┬─────────────────────────┬────────┴────────┬─────────────────────────┬──────────┬────⌄─────┐ + │ Full │ Varsize Columns Offsets │ Varsize Columns │ Null-Defaults │ Fixsize │ Varsize │ + │ Size │ Table Size │ Offsets Table │ Map │ Columns │ Columns │ + ├─────────┼─────────────────────────┼─────────────────┼─────────────────────────┼──────────┼──────────┤ + │ 4 Bytes │ 2 Bytes │ Variable │ ⌈Number of columns / 8⌉ │ Variable │ Variable │ + └─────────┴─────────────────────────┴─────────────────┴─────────────────────────┴──────────┴──────────┘ +All columns within a group are split into groups of fixed-size columns and variable-size columns. Withing the group of +fixsize columns, the columns are sorted by size, then by column name. Within the group of varsize columns, the columns +are sorted by column name. Inside a tuple default values and nulls are omitted and encoded in the null-defaults map +(essentially, a bitset). The size of the varsize columns offsets table is equal to the number of non-null non-default +varsize columns multiplied by 2 (a single entry in the offsets table is 2 bytes). The offset stored in the offsets table +is calculated from the beginning of the chunk. + +### Tuple construction and access +To assemble a tuple with some schema, an instance of `org.apache.ignite.internal.schema.TupleAssembler` +must be used which provides the low-level API for building tuples. When using the tuple assembler, the +columns must be passed to the assembler in the internal schema sort order. Additionally, when constructing +the instance of the assembler, the user should pre-calculate the size of the tuple to avoid extra array copies, +and the number of non-null varlen columns for key and value chunks. Less restrictive building techniques +are provided by class (de)serializers and tuple builder, which take care of sizing and column order. + +To read column values of a tuple, one needs to construct a subclass of +`org.apache.ignite.internal.schema.Tuple` which provides necessary logic to read arbitrary columns with +type checking. For primitive types, `org.apache.ignite.internal.schema.Tuple` provides boxed and non-boxed +value methods to avoid boxing in scenarios where boxing can be avoided (deserialization of non-null columns to +POJO primitives, for example). diff --git a/modules/schema/src/main/java/org/apache/ignite/internal/schema/package-info.java b/modules/schema/src/main/java/org/apache/ignite/internal/schema/package-info.java index 5aaa359..fe5f0e3 100644 --- a/modules/schema/src/main/java/org/apache/ignite/internal/schema/package-info.java +++ b/modules/schema/src/main/java/org/apache/ignite/internal/schema/package-info.java @@ -18,51 +18,5 @@ /** * <!-- Package description. --> * Contains schema description, tuple assembly and field accessor classes. - * <p> - * This package provides necessary infrastructure to create, read, convert to and from POJO classes - * schema-defined tuples. - * <p> - * Schema is defined as a set of columns which are split into key columns chunk and value columns chunk. - * Each column defined by a name, nullability flag, and a {@link org.apache.ignite.internal.schema.NativeType}. - * Type is a thin wrapper over the {@link org.apache.ignite.internal.schema.NativeTypeSpec} to provide differentiation - * between types of one kind with different size (an example of such differentiation is bitmask(n) or number(n)). - * {@link org.apache.ignite.internal.schema.NativeTypeSpec} provides necessary indirection to read a column as a - * {@code java.lang.Object} without needing to switch over the column type. - * <p> - * A tuple itself does not contain any type metadata and only contains necessary - * information required for fast column lookup. In a tuple, key columns and value columns are separated - * and written to chunks with identical structure (so that chunk is self-sufficient, and, provided with - * the column types can be read independently). - * Tuple structure has the following format: - * - * <pre> - * +---------+----------+----------+-------------+ - * | Schema | Key | Key chunk | Value chunk | - * | Version | Hash | Bytes | Bytes | - * +---------+------ --+-----------+-------------+ - * | 2 bytes | 4 bytes | | - * +---------+---------+-------------------------+ - * </pre> - * Each bytes section has the following structure: - * <pre> - * +---------+----------+---------+------+--------+--------+ - * | Total | Vartable | Varlen | Null | Fixlen | Varlen | - * | Length | Length | Offsets | Map | Bytes | Bytes | - * +---------+----------+---------+------+--------+--------+ - * | 4 bytes | 2 bytes | | - * +---------+---------------------------------------------+ - * </pre> - * To assemble a tuple with some schema, an instance of {@link org.apache.ignite.internal.schema.TupleAssembler} - * must be used which provides the low-level API for building tuples. When using the tuple assembler, the - * columns must be passed to the assembler in the internal schema sort order. Additionally, when constructing - * the instance of the assembler, the user should pre-calculate the size of the tuple to avoid extra array copies, - * and the number of non-null varlen columns for key and value chunks. Less restrictive building techniques - * are provided by class (de)serializers and tuple builder, which take care of sizing and column order. - * <p> - * To read column values of a tuple, one needs to construct a subclass of - * {@link org.apache.ignite.internal.schema.Tuple} which provides necessary logic to read arbitrary columns with - * type checking. For primitive types, {@link org.apache.ignite.internal.schema.Tuple} provides boxed and non-boxed - * value methods to avoid boxing in scenarios where boxing can be avoided (deserialization of non-null columns to - * POJO primitives, for example). */ package org.apache.ignite.internal.schema; \ No newline at end of file