This is an automated email from the ASF dual-hosted git repository.
jlli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.wiki.git
The following commit(s) were added to refs/heads/master by this push:
new f6ddd14 Updated Architecture (markdown)
f6ddd14 is described below
commit f6ddd144a140cce9e6518974cb85a1b9969eed78
Author: Jialiang Li <[email protected]>
AuthorDate: Tue Feb 5 13:35:17 2019 -0800
Updated Architecture (markdown)
---
Architecture.md | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/Architecture.md b/Architecture.md
index 45ec95f..c736e88 100644
--- a/Architecture.md
+++ b/Architecture.md
@@ -128,6 +128,8 @@ Pinot splits the entire data into multiple segments for
manageability and effici
for more info. As described in previous section, raw data is split into
columnar data. This section describes the columnar data format for each of the
possible data types.
##### Segment Entities
+Pinot now supports two versions of segments, i.e v1 and v3. They're almost the
same, except that v1 segment contains multiple files for different indexing
information, while v3 segment has only one file containing all the indexing
information.
+
###### Segment Metadata (metadata.properties)
This file contains metadata about the segment such as
@@ -402,21 +404,25 @@ Currently, we generate one dictionary per segment, in
future, we will explore th
###### Single Value unsorted Forward Index (.sv.unsorted.fwd)
-If the values in a column are not sorted, we have the following possible
optimizations:
+<div>If the values in a column are not sorted, we have the following possible
optimizations:
1. Dictionary encoding if feasible. See previous section on when we apply
dictionary encoding
2. <span style="line-height: 1.4285715;">Snappy or LZO or LZ4 or ZLIB
compression</span>
In the current version of Pinot, we only apply dictionary encoding that allows
us to compress the data using Fixed bit encoding. In subsequent versions, we
will evaluate other compression techniques such as snappy etc. While these
compression techniques save space, there is additional over head to decompress
them on the fly. The challenge here is to get the right trade-off between
compressing the data and query latency.
+</div>
###### Multi Value Forward Index (.mv.fwd)
-</div>
-
-In some cases, the columns are multi-valued such as skill sets of a member.
While dictionary encoding can be applied to multi value columns, forward index
is as straight forward as in single value use case since the number of values
per column can be arbitrary. The challenge here is to retrieve the values
corresponding to a given document id without scanning the entire data. In order
to achieve that, we create an a<span style="line-height: 1.4285715;">dditional
Header section that stores [...]
+<div>In some cases, the columns are multi-valued such as skill sets of a
member. While dictionary encoding can be applied to multi value columns,
forward index is as straight forward as in single value use case since the
number of values per column can be arbitrary. The challenge here is to retrieve
the values corresponding to a given document id without scanning the entire
data. In order to achieve that, we create an a<span style="line-height:
1.4285715;">dditional Header section that s [...]
<span style="line-height: 1.4285715;">
[[image2015-5-19 0-58-54.png]]
+</div>
+
+###### V3 Segment Index (index_map)
+<div> There's only one file for indexing information in V3 format segments.
The benefit of using V3 segment is to reduce the number of files open.
+</div>
### Query Processing
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]