[incubator-pinot.wiki] branch master updated: Updated Architecture (markdown)

jlli Tue, 05 Feb 2019 13:35:40 -0800

This is an automated email from the ASF dual-hosted git repository.

jlli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.wiki.git



The following commit(s) were added to refs/heads/master by this push:
     new f6ddd14  Updated Architecture (markdown)
f6ddd14 is described below

commit f6ddd144a140cce9e6518974cb85a1b9969eed78
Author: Jialiang Li <[email protected]>
AuthorDate: Tue Feb 5 13:35:17 2019 -0800

    Updated Architecture (markdown)
---
 Architecture.md | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/Architecture.md b/Architecture.md
index 45ec95f..c736e88 100644
--- a/Architecture.md
+++ b/Architecture.md
@@ -128,6 +128,8 @@ Pinot splits the entire data into multiple segments for 
manageability and effici
  for more info. As described in previous section, raw data is split into 
columnar data. This section describes the columnar data format for each of the 
possible data types.
 
 ##### Segment Entities
+Pinot now supports two versions of segments, i.e v1 and v3. They're almost the 
same, except that v1 segment contains multiple files for different indexing 
information, while v3 segment has only one file containing all the indexing 
information.
+
 ###### Segment Metadata (metadata.properties)
 
 This file contains metadata about the segment such as
@@ -402,21 +404,25 @@ Currently, we generate one dictionary per segment, in 
future, we will explore th
 
 ###### Single Value unsorted Forward Index (.sv.unsorted.fwd)
 
-If the values in a column are not sorted, we have the following possible 
optimizations:
+<div>If the values in a column are not sorted, we have the following possible 
optimizations:
 
 1.  Dictionary encoding if feasible. See previous section on when we apply 
dictionary encoding
 2.  <span style="line-height: 1.4285715;">Snappy or LZO or LZ4 or ZLIB 
compression</span>
 
 In the current version of Pinot, we only apply dictionary encoding that allows 
us to compress the data using Fixed bit encoding. In subsequent versions, we 
will evaluate other compression techniques such as snappy etc. While these 
compression techniques save space, there is additional over head to decompress 
them on the fly. The challenge here is to get the right trade-off between 
compressing the data and query latency.
+</div>
 
 ###### Multi Value Forward Index (.mv.fwd)
 
-</div>
-
-In some cases, the columns are multi-valued such as skill sets of a member. 
While dictionary encoding can be applied to multi value columns, forward index 
is as straight forward as in single value use case since the number of values 
per column can be arbitrary. The challenge here is to retrieve the values 
corresponding to a given document id without scanning the entire data. In order 
to achieve that, we create an a<span style="line-height: 1.4285715;">dditional 
Header section that stores [...]
+<div>In some cases, the columns are multi-valued such as skill sets of a 
member. While dictionary encoding can be applied to multi value columns, 
forward index is as straight forward as in single value use case since the 
number of values per column can be arbitrary. The challenge here is to retrieve 
the values corresponding to a given document id without scanning the entire 
data. In order to achieve that, we create an a<span style="line-height: 
1.4285715;">dditional Header section that s [...]
 
 <span style="line-height: 1.4285715;">
 [[image2015-5-19 0-58-54.png]]
+</div>
+
+###### V3 Segment Index (index_map)
+<div> There's only one file for indexing information in V3 format segments. 
The benefit of using V3 segment is to reduce the number of files open.
+</div>
 
 ### Query Processing
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[incubator-pinot.wiki] branch master updated: Updated Architecture (markdown)

Reply via email to