This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc-format.git
The following commit(s) were added to refs/heads/main by this push:
new 509c4c0 Add markdown images (#16)
509c4c0 is described below
commit 509c4c04fbee043a59dfbe64fa65c31e328490b6
Author: cxzl25 <[email protected]>
AuthorDate: Fri Feb 2 01:48:01 2024 +0800
Add markdown images (#16)
---
specification/ORCv0.md | 6 +++---
specification/ORCv1.md | 10 +++++-----
specification/ORCv2.md | 10 +++++-----
specification/img/BloomFilter.png | Bin 0 -> 61887 bytes
specification/img/CompressionStream.png | Bin 0 -> 91623 bytes
specification/img/Direct.png | Bin 0 -> 64400 bytes
specification/img/OrcFileLayout.png | Bin 0 -> 127908 bytes
specification/img/TreeWriters.png | Bin 0 -> 134465 bytes
8 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/specification/ORCv0.md b/specification/ORCv0.md
index f0840f8..fe0d8aa 100644
--- a/specification/ORCv0.md
+++ b/specification/ORCv0.md
@@ -27,7 +27,7 @@ include the minimum and maximum values for each column in
each set of
file reader can skip entire sets of rows that aren't important for
this query.
-
+
# File Tail
@@ -158,7 +158,7 @@ All of the rows in an ORC file must have the same schema.
Logically
the schema is expressed as a tree as in the figure below, where
the compound types have subcolumns under them.
-
+
The equivalent Hive DDL would be:
@@ -381,7 +381,7 @@ for a chunk that compressed to 100,000 bytes would be
[0x40, 0x0d,
that as long as a decompressor starts at the top of a header, it can
start decompressing without the previous bytes.
-
+
The default compression chunk size is 256K, but writers can choose
their own value. Larger chunks lead to better compression, but require
diff --git a/specification/ORCv1.md b/specification/ORCv1.md
index 2834764..08bd436 100644
--- a/specification/ORCv1.md
+++ b/specification/ORCv1.md
@@ -27,7 +27,7 @@ include the minimum and maximum values for each column in
each set of
file reader can skip entire sets of rows that aren't important for
this query.
-
+
# File Tail
@@ -200,7 +200,7 @@ All of the rows in an ORC file must have the same schema.
Logically
the schema is expressed as a tree as in the figure below, where
the compound types have subcolumns under them.
-
+
The equivalent Hive DDL would be:
@@ -619,7 +619,7 @@ for a chunk that compressed to 100,000 bytes would be
[0x40, 0x0d,
that as long as a decompressor starts at the top of a header, it can
start decompressing without the previous bytes.
-
+
The default compression chunk size is 256K, but writers can choose
their own value. Larger chunks lead to better compression, but require
@@ -796,7 +796,7 @@ length of 4 (3) as [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e,
0xde, 0xad,
> Note: the run length(4) is one-off. We can get 4 by adding 1 to 3
(See
[Hive-4123](https://github.com/apache/hive/commit/69deabeaac020ba60b0f2156579f53e9fe46157a#diff-c00fea1863eaf0d6f047535e874274199020ffed3eb00deb897f513aa86f6b59R232-R236))
-
+
### Patched Base
@@ -1334,4 +1334,4 @@ Bloom filter streams are interlaced with row group
indexes. This placement
makes it convenient to read the bloom filter stream and row index stream
together in single read operation.
-
+
diff --git a/specification/ORCv2.md b/specification/ORCv2.md
index 010de73..b263af6 100644
--- a/specification/ORCv2.md
+++ b/specification/ORCv2.md
@@ -47,7 +47,7 @@ include the minimum and maximum values for each column in
each set of
file reader can skip entire sets of rows that aren't important for
this query.
-
+
# File Tail
@@ -220,7 +220,7 @@ All of the rows in an ORC file must have the same schema.
Logically
the schema is expressed as a tree as in the figure below, where
the compound types have subcolumns under them.
-
+
The equivalent Hive DDL would be:
@@ -638,7 +638,7 @@ for a chunk that compressed to 100,000 bytes would be
[0x40, 0x0d,
that as long as a decompressor starts at the top of a header, it can
start decompressing without the previous bytes.
-
+
The default compression chunk size is 256K, but writers can choose
their own value. Larger chunks lead to better compression, but require
@@ -815,7 +815,7 @@ length of 4 (3) as [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e,
0xde, 0xad,
> Note: the run length(4) is one-off. We can get 4 by adding 1 to 3
(See
[Hive-4123](https://github.com/apache/hive/commit/69deabeaac020ba60b0f2156579f53e9fe46157a#diff-c00fea1863eaf0d6f047535e874274199020ffed3eb00deb897f513aa86f6b59R232-R236))
-
+
### Patched Base
@@ -1350,4 +1350,4 @@ Bloom filter streams are interlaced with row group
indexes. This placement
makes it convenient to read the bloom filter stream and row index stream
together in single read operation.
-
+
diff --git a/specification/img/BloomFilter.png
b/specification/img/BloomFilter.png
new file mode 100644
index 0000000..702fdf0
Binary files /dev/null and b/specification/img/BloomFilter.png differ
diff --git a/specification/img/CompressionStream.png
b/specification/img/CompressionStream.png
new file mode 100644
index 0000000..85b88bc
Binary files /dev/null and b/specification/img/CompressionStream.png differ
diff --git a/specification/img/Direct.png b/specification/img/Direct.png
new file mode 100644
index 0000000..eadf5ff
Binary files /dev/null and b/specification/img/Direct.png differ
diff --git a/specification/img/OrcFileLayout.png
b/specification/img/OrcFileLayout.png
new file mode 100644
index 0000000..ca0d456
Binary files /dev/null and b/specification/img/OrcFileLayout.png differ
diff --git a/specification/img/TreeWriters.png
b/specification/img/TreeWriters.png
new file mode 100644
index 0000000..395e99d
Binary files /dev/null and b/specification/img/TreeWriters.png differ