[GitHub] [druid] paul-rogers commented on a diff in pull request #12344: Segments doc refactor

GitBox Wed, 01 Jun 2022 20:05:48 -0700


paul-rogers commented on code in PR #12344:
URL: https://github.com/apache/druid/pull/12344#discussion_r887431487



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by

Review Comment:
   "stores its index" --> "stores its data and indexes"
   
   The segment files store data, metadata and indexes.



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by
+time. In a basic setup, Druid creates one segment file for each time
 interval, where the time interval is configurable in the
 `segmentGranularity` parameter of the
-[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).  For 
Druid to
-operate well under heavy query load, it is important for the segment
+[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).
+
+For Druid to operate well under heavy query load, it is important for the 
segment
 file size to be within the recommended range of 300MB-700MB. If your
 segment files are larger than this range, then consider either
 changing the granularity of the time interval or partitioning your
-data and tweaking the `targetRowsPerSegment` in your `partitionsSpec`
-(a good starting point for this parameter is 5 million rows).  See the
-sharding section below and the 'Partitioning specification' section of
+data and adjusting the `targetRowsPerSegment` in your `partitionsSpec`.

Review Comment:
   and --> and/or



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by
+time. In a basic setup, Druid creates one segment file for each time
 interval, where the time interval is configurable in the
 `segmentGranularity` parameter of the
-[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).  For 
Druid to
-operate well under heavy query load, it is important for the segment
+[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).
+
+For Druid to operate well under heavy query load, it is important for the 
segment
 file size to be within the recommended range of 300MB-700MB. If your
 segment files are larger than this range, then consider either
 changing the granularity of the time interval or partitioning your
-data and tweaking the `targetRowsPerSegment` in your `partitionsSpec`
-(a good starting point for this parameter is 5 million rows).  See the
-sharding section below and the 'Partitioning specification' section of
+data and adjusting the `targetRowsPerSegment` in your `partitionsSpec`.
+A good starting point for this parameter is 5 million rows.
+
+See the Sharding section below and the 'Partitioning specification' section of
 the [Batch ingestion](../ingestion/hadoop.md#partitionsspec) documentation
-for more information.
+for more guidance.
 
-### A segment file's core data structures
+## Segment file structure
 
-Here we describe the internal structure of segment files, which is
-essentially *columnar*: the data for each column is laid out in
-separate data structures. By storing each column separately, Druid can
-decrease query latency by scanning only those columns actually needed
-for a query.  There are three basic column types: the timestamp
-column, dimension columns, and metric columns, as illustrated in the
-image below:
+Segment files are *columnar*: the data for each column is laid out in
+separate data structures. By storing each column separately, Druid decreases 
query latency by scanning only those columns actually needed for a query.  
There are three basic column types: timestamp, dimensions, and metrics:
 
 ![Druid column types](../assets/druid-column-types.png "Druid Column Types")
 
-The timestamp and metric columns are simple: behind the scenes each of
-these is an array of integer or floating point values compressed with
-LZ4. Once a query knows which rows it needs to select, it simply
-decompresses these, pulls out the relevant rows, and applies the
-desired aggregation operator. As with all columns, if a query doesn’t
-require a column, then that column’s data is just skipped over.
+Timestamp and metrics type columns are arrays of integer or floating point 
values compressed with
+[LZ4](https://github.com/lz4/lz4-java). Once a query identifies which rows to 
select, it decompresses them, pulls out the relevant rows, and applies the
+desired aggregation operator. If a query doesn’t require a column, Druid skips 
over that column's data.
 
-Dimensions columns are different because they support filter and
+Dimension columns are different because they support filter and
 group-by operations, so each dimension requires the following
 three data structures:
 
-1. A dictionary that maps values (which are always treated as strings) to 
integer IDs,
-2. A list of the column’s values, encoded using the dictionary in 1, and
-3. For each distinct value in the column, a bitmap that indicates which rows 
contain that value.
-
-
-Why these three data structures? The dictionary simply maps string
-values to integer ids so that the values in \(2\) and \(3\) can be
-represented compactly. The bitmaps in \(3\) -- also known as *inverted
-indexes* allow for quick filtering operations (specifically, bitmaps
-are convenient for quickly applying AND and OR operators). Finally,
-the list of values in \(2\) is needed for *group by* and *TopN*
-queries. In other words, queries that solely aggregate metrics based
-on filters do not need to touch the list of dimension values stored in \(2\).
+- Dictionary: Maps values (which are always treated as strings) to integer 
IDs, allowing compact representation of the list and bitmap values.
+- List: The column’s values, encoded using the dictionary. Required for 
GroupBy and TopN queries. These operators allow queries that solely aggregate 
metrics based on filters to run without accessing the list of values.
+- Bitmap: One bitmap for each distinct value in the column, to indicate which 
rows contain that value. Bitmaps allow for quick filtering operations because 
they are convenient for quickly applying AND and OR operators. Also known as 
inverted indexes.
 
-To get a concrete sense of these data structures, consider the ‘page’
-column from the example data above.  The three data structures that
-represent this dimension are illustrated in the diagram below.
+To get a better sense of these data structures, consider the ‘page’ column 
from the given example data as represented by the following data structures:
 
 ```
-1: Dictionary that encodes column values
-  {
+1: Dictionary
+   {
     "Justin Bieber": 0,
     "Ke$ha":         1
-  }
+   }
 
-2: Column data
-  [0,
+2: List of column data
+   [0,
    0,
    1,
    1]
 
-3: Bitmaps - one for each unique value of the column
-  value="Justin Bieber": [1,1,0,0]
-  value="Ke$ha":         [0,0,1,1]
+3: Bitmaps
+   value="Justin Bieber": [1,1,0,0]
+   value="Ke$ha":         [0,0,1,1]
 ```
 
-Note that the bitmap is different from the first two data structures:
-whereas the first two grow linearly in the size of the data (in the
-worst case), the size of the bitmap section is the product of data
-size * column cardinality. Compression will help us here though
-because we know that for each row in 'column data', there will only be a
-single bitmap that has non-zero entry. This means that high cardinality
-columns will have extremely sparse, and therefore highly compressible,
-bitmaps. Druid exploits this using compression algorithms that are
-specially suited for bitmaps, such as roaring bitmap compression.
+Note that the bitmap is different from the dictionary and list data 
structures: the dictionary and list grow linearly with the size of the data, 
but the size of the bitmap section is the product of data size * column 
cardinality. 
+
+For each row in the list of column data, there is only a single bitmap that 
has a non-zero entry. This means that high cardinality columns have extremely 
sparse, and therefore highly compressible, bitmaps. Druid exploits this using 
compression algorithms that are specially suited for bitmaps, such as [Roaring 
bitmap compression](https://github.com/RoaringBitmap/RoaringBitmap).
+
+## Handling null values
+
+By default, Druid string dimension columns use the values `''` and `null` 
interchangeably and numeric and metric columns can not represent `null` at all, 
instead coercing nulls to `0`. However, Druid also provides a SQL compatible 
null handling mode, which you can enable at the system level, through 
`druid.generic.useDefaultValueForNull`. This setting, when set to `false`, 
allows Druid to create segments _at ingestion time_ in which the string columns 
can distinguish `''` from `null`, and numeric columns which can represent 
`null` valued rows instead of `0`.
+
+String dimension columns contain no additional column structures in this mode, 
instead they reserve an additional dictionary entry for the `null` value. 
Numeric columns are stored in the segment with an additional bitmap in which 
the set bits indicate `null` valued rows. 
+
+In addition to slightly increased segment sizes, SQL compatible null handling 
can incur a performance cost at query time, due to the need to check the null 
bitmap. This performance cost only occurs for columns that actually contain 
null values.
+
+## Segments with different schemas
+
+Druid segments for the same datasource may have different schemas. If a string 
column (dimension) exists in one segment but not another, queries that involve 
both segments still work. Queries for the segment without the dimension behave 
as if the dimension contains only null values. Similarly, if one segment has a 
numeric column (metric) but another does not, queries on the segment without 
the metric generally operate as expected. Aggregations over the missing metric 
operate as if the metric doesn't exist.
+
+## Column format
+
+Each column is stored as two parts:
+
+- A Jackson-serialized ColumnDescriptor.
+- The rest of the binary for the column.
+
+A ColumnDescriptor is an object that allows the use of Jackson's polymorphic 
deserialization to add new and interesting methods of serialization with 
minimal impact to the code. It consists of some metadata about the column (for 
example: type, whether it's multi-value) and a list of 
serialization/deserialization logic that can deserialize the rest of the binary.

Review Comment:
   Back-tick quote again.
   
   Not user this paragraph is of value to most _users_. It seems of more 
interest to _developers_ of Druid or extensions.



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by
+time. In a basic setup, Druid creates one segment file for each time

Review Comment:
   "one segment for each time interval" --> "one segment for each segment time 
interval"
   
   When I first read this in the original docs, I found it confusing. A "time 
interval" can be anything: the amount of time between your PR and my review, 
say. The relevant interval is something variously called the "segment 
granularity", "chunk size", "segment interval", etc. Would be great for the doc 
team to standardize on a name (or maybe we have a preferred term we can use 
here?)



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by
+time. In a basic setup, Druid creates one segment file for each time
 interval, where the time interval is configurable in the
 `segmentGranularity` parameter of the
-[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).  For 
Druid to
-operate well under heavy query load, it is important for the segment
+[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).
+
+For Druid to operate well under heavy query load, it is important for the 
segment
 file size to be within the recommended range of 300MB-700MB. If your
 segment files are larger than this range, then consider either
 changing the granularity of the time interval or partitioning your

Review Comment:
   changing the granularity of the segment time interval
   
   (vague terminology again)



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by
+time. In a basic setup, Druid creates one segment file for each time
 interval, where the time interval is configurable in the
 `segmentGranularity` parameter of the
-[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).  For 
Druid to
-operate well under heavy query load, it is important for the segment
+[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).
+
+For Druid to operate well under heavy query load, it is important for the 
segment
 file size to be within the recommended range of 300MB-700MB. If your
 segment files are larger than this range, then consider either
 changing the granularity of the time interval or partitioning your
-data and tweaking the `targetRowsPerSegment` in your `partitionsSpec`
-(a good starting point for this parameter is 5 million rows).  See the
-sharding section below and the 'Partitioning specification' section of
+data and adjusting the `targetRowsPerSegment` in your `partitionsSpec`.
+A good starting point for this parameter is 5 million rows.
+
+See the Sharding section below and the 'Partitioning specification' section of
 the [Batch ingestion](../ingestion/hadoop.md#partitionsspec) documentation
-for more information.
+for more guidance.
 
-### A segment file's core data structures
+## Segment file structure
 
-Here we describe the internal structure of segment files, which is
-essentially *columnar*: the data for each column is laid out in
-separate data structures. By storing each column separately, Druid can
-decrease query latency by scanning only those columns actually needed
-for a query.  There are three basic column types: the timestamp
-column, dimension columns, and metric columns, as illustrated in the
-image below:
+Segment files are *columnar*: the data for each column is laid out in
+separate data structures. By storing each column separately, Druid decreases 
query latency by scanning only those columns actually needed for a query.  
There are three basic column types: timestamp, dimensions, and metrics:
 
 ![Druid column types](../assets/druid-column-types.png "Druid Column Types")
 
-The timestamp and metric columns are simple: behind the scenes each of
-these is an array of integer or floating point values compressed with
-LZ4. Once a query knows which rows it needs to select, it simply
-decompresses these, pulls out the relevant rows, and applies the
-desired aggregation operator. As with all columns, if a query doesn’t
-require a column, then that column’s data is just skipped over.
+Timestamp and metrics type columns are arrays of integer or floating point 
values compressed with
+[LZ4](https://github.com/lz4/lz4-java). Once a query identifies which rows to 
select, it decompresses them, pulls out the relevant rows, and applies the
+desired aggregation operator. If a query doesn’t require a column, Druid skips 
over that column's data.
 
-Dimensions columns are different because they support filter and
+Dimension columns are different because they support filter and
 group-by operations, so each dimension requires the following
 three data structures:
 
-1. A dictionary that maps values (which are always treated as strings) to 
integer IDs,
-2. A list of the column’s values, encoded using the dictionary in 1, and
-3. For each distinct value in the column, a bitmap that indicates which rows 
contain that value.
-
-
-Why these three data structures? The dictionary simply maps string
-values to integer ids so that the values in \(2\) and \(3\) can be
-represented compactly. The bitmaps in \(3\) -- also known as *inverted
-indexes* allow for quick filtering operations (specifically, bitmaps
-are convenient for quickly applying AND and OR operators). Finally,
-the list of values in \(2\) is needed for *group by* and *TopN*
-queries. In other words, queries that solely aggregate metrics based
-on filters do not need to touch the list of dimension values stored in \(2\).
+- Dictionary: Maps values (which are always treated as strings) to integer 
IDs, allowing compact representation of the list and bitmap values.
+- List: The column’s values, encoded using the dictionary. Required for 
GroupBy and TopN queries. These operators allow queries that solely aggregate 
metrics based on filters to run without accessing the list of values.
+- Bitmap: One bitmap for each distinct value in the column, to indicate which 
rows contain that value. Bitmaps allow for quick filtering operations because 
they are convenient for quickly applying AND and OR operators. Also known as 
inverted indexes.
 
-To get a concrete sense of these data structures, consider the ‘page’
-column from the example data above.  The three data structures that
-represent this dimension are illustrated in the diagram below.
+To get a better sense of these data structures, consider the ‘page’ column 
from the given example data as represented by the following data structures:
 
 ```
-1: Dictionary that encodes column values
-  {
+1: Dictionary
+   {
     "Justin Bieber": 0,
     "Ke$ha":         1
-  }
+   }
 
-2: Column data
-  [0,
+2: List of column data
+   [0,
    0,
    1,
    1]
 
-3: Bitmaps - one for each unique value of the column
-  value="Justin Bieber": [1,1,0,0]
-  value="Ke$ha":         [0,0,1,1]
+3: Bitmaps
+   value="Justin Bieber": [1,1,0,0]
+   value="Ke$ha":         [0,0,1,1]
 ```
 
-Note that the bitmap is different from the first two data structures:
-whereas the first two grow linearly in the size of the data (in the
-worst case), the size of the bitmap section is the product of data
-size * column cardinality. Compression will help us here though
-because we know that for each row in 'column data', there will only be a
-single bitmap that has non-zero entry. This means that high cardinality
-columns will have extremely sparse, and therefore highly compressible,
-bitmaps. Druid exploits this using compression algorithms that are
-specially suited for bitmaps, such as roaring bitmap compression.
+Note that the bitmap is different from the dictionary and list data 
structures: the dictionary and list grow linearly with the size of the data, 
but the size of the bitmap section is the product of data size * column 
cardinality. 
+
+For each row in the list of column data, there is only a single bitmap that 
has a non-zero entry. This means that high cardinality columns have extremely 
sparse, and therefore highly compressible, bitmaps. Druid exploits this using 
compression algorithms that are specially suited for bitmaps, such as [Roaring 
bitmap compression](https://github.com/RoaringBitmap/RoaringBitmap).
+
+## Handling null values
+
+By default, Druid string dimension columns use the values `''` and `null` 
interchangeably and numeric and metric columns can not represent `null` at all, 
instead coercing nulls to `0`. However, Druid also provides a SQL compatible 
null handling mode, which you can enable at the system level, through 
`druid.generic.useDefaultValueForNull`. This setting, when set to `false`, 
allows Druid to create segments _at ingestion time_ in which the string columns 
can distinguish `''` from `null`, and numeric columns which can represent 
`null` valued rows instead of `0`.
+
+String dimension columns contain no additional column structures in this mode, 
instead they reserve an additional dictionary entry for the `null` value. 
Numeric columns are stored in the segment with an additional bitmap in which 
the set bits indicate `null` valued rows. 

Review Comment:
   `in this mode`: in _which_ mode? To keep things sane, perhaps have one 
section for SQL-compatible null behavior, another for Druid Native behavior. (I 
call it "Druid Native" because "replace nulls with blanks or zeros behavior" is 
too much of a mouthful.)



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by
+time. In a basic setup, Druid creates one segment file for each time
 interval, where the time interval is configurable in the
 `segmentGranularity` parameter of the
-[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).  For 
Druid to
-operate well under heavy query load, it is important for the segment
+[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).
+
+For Druid to operate well under heavy query load, it is important for the 
segment
 file size to be within the recommended range of 300MB-700MB. If your
 segment files are larger than this range, then consider either
 changing the granularity of the time interval or partitioning your
-data and tweaking the `targetRowsPerSegment` in your `partitionsSpec`
-(a good starting point for this parameter is 5 million rows).  See the
-sharding section below and the 'Partitioning specification' section of
+data and adjusting the `targetRowsPerSegment` in your `partitionsSpec`.
+A good starting point for this parameter is 5 million rows.
+
+See the Sharding section below and the 'Partitioning specification' section of
 the [Batch ingestion](../ingestion/hadoop.md#partitionsspec) documentation
-for more information.
+for more guidance.
 
-### A segment file's core data structures
+## Segment file structure
 
-Here we describe the internal structure of segment files, which is
-essentially *columnar*: the data for each column is laid out in
-separate data structures. By storing each column separately, Druid can
-decrease query latency by scanning only those columns actually needed
-for a query.  There are three basic column types: the timestamp
-column, dimension columns, and metric columns, as illustrated in the
-image below:
+Segment files are *columnar*: the data for each column is laid out in
+separate data structures. By storing each column separately, Druid decreases 
query latency by scanning only those columns actually needed for a query.  
There are three basic column types: timestamp, dimensions, and metrics:
 
 ![Druid column types](../assets/druid-column-types.png "Druid Column Types")
 
-The timestamp and metric columns are simple: behind the scenes each of
-these is an array of integer or floating point values compressed with
-LZ4. Once a query knows which rows it needs to select, it simply
-decompresses these, pulls out the relevant rows, and applies the
-desired aggregation operator. As with all columns, if a query doesn’t
-require a column, then that column’s data is just skipped over.
+Timestamp and metrics type columns are arrays of integer or floating point 
values compressed with
+[LZ4](https://github.com/lz4/lz4-java). Once a query identifies which rows to 
select, it decompresses them, pulls out the relevant rows, and applies the
+desired aggregation operator. If a query doesn’t require a column, Druid skips 
over that column's data.
 
-Dimensions columns are different because they support filter and
+Dimension columns are different because they support filter and
 group-by operations, so each dimension requires the following
 three data structures:
 
-1. A dictionary that maps values (which are always treated as strings) to 
integer IDs,
-2. A list of the column’s values, encoded using the dictionary in 1, and
-3. For each distinct value in the column, a bitmap that indicates which rows 
contain that value.
-
-
-Why these three data structures? The dictionary simply maps string
-values to integer ids so that the values in \(2\) and \(3\) can be
-represented compactly. The bitmaps in \(3\) -- also known as *inverted
-indexes* allow for quick filtering operations (specifically, bitmaps
-are convenient for quickly applying AND and OR operators). Finally,
-the list of values in \(2\) is needed for *group by* and *TopN*
-queries. In other words, queries that solely aggregate metrics based
-on filters do not need to touch the list of dimension values stored in \(2\).
+- Dictionary: Maps values (which are always treated as strings) to integer 
IDs, allowing compact representation of the list and bitmap values.
+- List: The column’s values, encoded using the dictionary. Required for 
GroupBy and TopN queries. These operators allow queries that solely aggregate 
metrics based on filters to run without accessing the list of values.
+- Bitmap: One bitmap for each distinct value in the column, to indicate which 
rows contain that value. Bitmaps allow for quick filtering operations because 
they are convenient for quickly applying AND and OR operators. Also known as 
inverted indexes.
 
-To get a concrete sense of these data structures, consider the ‘page’
-column from the example data above.  The three data structures that
-represent this dimension are illustrated in the diagram below.
+To get a better sense of these data structures, consider the ‘page’ column 
from the given example data as represented by the following data structures:
 
 ```
-1: Dictionary that encodes column values
-  {
+1: Dictionary
+   {
     "Justin Bieber": 0,
     "Ke$ha":         1
-  }
+   }
 
-2: Column data
-  [0,
+2: List of column data
+   [0,
    0,
    1,
    1]
 
-3: Bitmaps - one for each unique value of the column
-  value="Justin Bieber": [1,1,0,0]
-  value="Ke$ha":         [0,0,1,1]
+3: Bitmaps
+   value="Justin Bieber": [1,1,0,0]
+   value="Ke$ha":         [0,0,1,1]
 ```
 
-Note that the bitmap is different from the first two data structures:
-whereas the first two grow linearly in the size of the data (in the
-worst case), the size of the bitmap section is the product of data
-size * column cardinality. Compression will help us here though
-because we know that for each row in 'column data', there will only be a
-single bitmap that has non-zero entry. This means that high cardinality
-columns will have extremely sparse, and therefore highly compressible,
-bitmaps. Druid exploits this using compression algorithms that are
-specially suited for bitmaps, such as roaring bitmap compression.
+Note that the bitmap is different from the dictionary and list data 
structures: the dictionary and list grow linearly with the size of the data, 
but the size of the bitmap section is the product of data size * column 
cardinality. 
+
+For each row in the list of column data, there is only a single bitmap that 
has a non-zero entry. This means that high cardinality columns have extremely 
sparse, and therefore highly compressible, bitmaps. Druid exploits this using 
compression algorithms that are specially suited for bitmaps, such as [Roaring 
bitmap compression](https://github.com/RoaringBitmap/RoaringBitmap).
+
+## Handling null values
+
+By default, Druid string dimension columns use the values `''` and `null` 
interchangeably and numeric and metric columns can not represent `null` at all, 
instead coercing nulls to `0`. However, Druid also provides a SQL compatible 
null handling mode, which you can enable at the system level, through 
`druid.generic.useDefaultValueForNull`. This setting, when set to `false`, 
allows Druid to create segments _at ingestion time_ in which the string columns 
can distinguish `''` from `null`, and numeric columns which can represent 
`null` valued rows instead of `0`.
+
+String dimension columns contain no additional column structures in this mode, 
instead they reserve an additional dictionary entry for the `null` value. 
Numeric columns are stored in the segment with an additional bitmap in which 
the set bits indicate `null` valued rows. 
+
+In addition to slightly increased segment sizes, SQL compatible null handling 
can incur a performance cost at query time, due to the need to check the null 
bitmap. This performance cost only occurs for columns that actually contain 
null values.
+
+## Segments with different schemas
+
+Druid segments for the same datasource may have different schemas. If a string 
column (dimension) exists in one segment but not another, queries that involve 
both segments still work. Queries for the segment without the dimension behave 
as if the dimension contains only null values. Similarly, if one segment has a 
numeric column (metric) but another does not, queries on the segment without 
the metric generally operate as expected. Aggregations over the missing metric 
operate as if the metric doesn't exist.
+
+## Column format
+
+Each column is stored as two parts:
+
+- A Jackson-serialized ColumnDescriptor.
+- The rest of the binary for the column.
+
+A ColumnDescriptor is an object that allows the use of Jackson's polymorphic 
deserialization to add new and interesting methods of serialization with 
minimal impact to the code. It consists of some metadata about the column (for 
example: type, whether it's multi-value) and a list of 
serialization/deserialization logic that can deserialize the rest of the binary.
 
 ### Multi-value columns
 
-If a data source makes use of multi-value columns, then the data
-structures within the segment files look a bit different. Let's
-imagine that in the example above, the second row were tagged with
-both the 'Ke$ha' *and* 'Justin Bieber' topics. In this case, the three
-data structures would now look as follows:
+If a data source uses multi-value columns, then the data structures within the 
segment files look a bit different. Let's imagine that in the example above, 
the second row is tagged with both the `Ke$ha` *and* `Justin Bieber` topics, as 
follows:

Review Comment:
   Maybe start with explaining what a multi-value column is:
   
   | A multi-value column allows a single row to contain multiple strings for a 
column. Think of it as an array of strings.



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by
+time. In a basic setup, Druid creates one segment file for each time
 interval, where the time interval is configurable in the
 `segmentGranularity` parameter of the
-[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).  For 
Druid to
-operate well under heavy query load, it is important for the segment
+[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).
+
+For Druid to operate well under heavy query load, it is important for the 
segment
 file size to be within the recommended range of 300MB-700MB. If your
 segment files are larger than this range, then consider either
 changing the granularity of the time interval or partitioning your
-data and tweaking the `targetRowsPerSegment` in your `partitionsSpec`
-(a good starting point for this parameter is 5 million rows).  See the
-sharding section below and the 'Partitioning specification' section of
+data and adjusting the `targetRowsPerSegment` in your `partitionsSpec`.
+A good starting point for this parameter is 5 million rows.
+
+See the Sharding section below and the 'Partitioning specification' section of
 the [Batch ingestion](../ingestion/hadoop.md#partitionsspec) documentation
-for more information.
+for more guidance.
 
-### A segment file's core data structures
+## Segment file structure
 
-Here we describe the internal structure of segment files, which is
-essentially *columnar*: the data for each column is laid out in
-separate data structures. By storing each column separately, Druid can
-decrease query latency by scanning only those columns actually needed
-for a query.  There are three basic column types: the timestamp
-column, dimension columns, and metric columns, as illustrated in the
-image below:
+Segment files are *columnar*: the data for each column is laid out in
+separate data structures. By storing each column separately, Druid decreases 
query latency by scanning only those columns actually needed for a query.  
There are three basic column types: timestamp, dimensions, and metrics:
 
 ![Druid column types](../assets/druid-column-types.png "Druid Column Types")
 
-The timestamp and metric columns are simple: behind the scenes each of
-these is an array of integer or floating point values compressed with
-LZ4. Once a query knows which rows it needs to select, it simply
-decompresses these, pulls out the relevant rows, and applies the
-desired aggregation operator. As with all columns, if a query doesn’t
-require a column, then that column’s data is just skipped over.
+Timestamp and metrics type columns are arrays of integer or floating point 
values compressed with
+[LZ4](https://github.com/lz4/lz4-java). Once a query identifies which rows to 
select, it decompresses them, pulls out the relevant rows, and applies the
+desired aggregation operator. If a query doesn’t require a column, Druid skips 
over that column's data.
 
-Dimensions columns are different because they support filter and
+Dimension columns are different because they support filter and
 group-by operations, so each dimension requires the following
 three data structures:
 
-1. A dictionary that maps values (which are always treated as strings) to 
integer IDs,
-2. A list of the column’s values, encoded using the dictionary in 1, and
-3. For each distinct value in the column, a bitmap that indicates which rows 
contain that value.
-
-
-Why these three data structures? The dictionary simply maps string
-values to integer ids so that the values in \(2\) and \(3\) can be
-represented compactly. The bitmaps in \(3\) -- also known as *inverted
-indexes* allow for quick filtering operations (specifically, bitmaps
-are convenient for quickly applying AND and OR operators). Finally,
-the list of values in \(2\) is needed for *group by* and *TopN*
-queries. In other words, queries that solely aggregate metrics based
-on filters do not need to touch the list of dimension values stored in \(2\).
+- Dictionary: Maps values (which are always treated as strings) to integer 
IDs, allowing compact representation of the list and bitmap values.
+- List: The column’s values, encoded using the dictionary. Required for 
GroupBy and TopN queries. These operators allow queries that solely aggregate 
metrics based on filters to run without accessing the list of values.
+- Bitmap: One bitmap for each distinct value in the column, to indicate which 
rows contain that value. Bitmaps allow for quick filtering operations because 
they are convenient for quickly applying AND and OR operators. Also known as 
inverted indexes.
 
-To get a concrete sense of these data structures, consider the ‘page’
-column from the example data above.  The three data structures that
-represent this dimension are illustrated in the diagram below.
+To get a better sense of these data structures, consider the ‘page’ column 
from the given example data as represented by the following data structures:
 
 ```
-1: Dictionary that encodes column values
-  {
+1: Dictionary
+   {
     "Justin Bieber": 0,
     "Ke$ha":         1
-  }
+   }
 
-2: Column data
-  [0,
+2: List of column data
+   [0,
    0,
    1,
    1]
 
-3: Bitmaps - one for each unique value of the column
-  value="Justin Bieber": [1,1,0,0]
-  value="Ke$ha":         [0,0,1,1]
+3: Bitmaps
+   value="Justin Bieber": [1,1,0,0]
+   value="Ke$ha":         [0,0,1,1]
 ```
 
-Note that the bitmap is different from the first two data structures:
-whereas the first two grow linearly in the size of the data (in the
-worst case), the size of the bitmap section is the product of data
-size * column cardinality. Compression will help us here though
-because we know that for each row in 'column data', there will only be a
-single bitmap that has non-zero entry. This means that high cardinality
-columns will have extremely sparse, and therefore highly compressible,
-bitmaps. Druid exploits this using compression algorithms that are
-specially suited for bitmaps, such as roaring bitmap compression.
+Note that the bitmap is different from the dictionary and list data 
structures: the dictionary and list grow linearly with the size of the data, 
but the size of the bitmap section is the product of data size * column 
cardinality. 
+
+For each row in the list of column data, there is only a single bitmap that 
has a non-zero entry. This means that high cardinality columns have extremely 
sparse, and therefore highly compressible, bitmaps. Druid exploits this using 
compression algorithms that are specially suited for bitmaps, such as [Roaring 
bitmap compression](https://github.com/RoaringBitmap/RoaringBitmap).
+
+## Handling null values
+
+By default, Druid string dimension columns use the values `''` and `null` 
interchangeably and numeric and metric columns can not represent `null` at all, 
instead coercing nulls to `0`. However, Druid also provides a SQL compatible 
null handling mode, which you can enable at the system level, through 
`druid.generic.useDefaultValueForNull`. This setting, when set to `false`, 
allows Druid to create segments _at ingestion time_ in which the string columns 
can distinguish `''` from `null`, and numeric columns which can represent 
`null` valued rows instead of `0`.

Review Comment:
   This is a tricky area! As it turns out, Druid does both, and the effect is 
throughout the system, not just in the storage layer.
   
   In "SQL-compatible" mode, blanks and `NULL` values are distinct: a column 
can be `NULL`, `''` or `'foo`'. In "replace nulls with blanks" (legacy) mode. 
blanks are considered to be `NULL`, there is no `NULL` value, and it is 
impossible to store a blank string.
   
   Same story with numbers for 0 and `NULL`.
   
   This option must be set at the time that Druid is first installed. Choose 
wisely as behavior will be surprising if the setting is changed once the system 
contains data.
   
   This means that users should decide, when first installing Druid, if their 
app requires `NULL` (unknown) values, or if the incoming data uses blanks & 
zeros for missing values. Configure Druid accordingly. After that, the data 
stored in the system, and the computation engine, will all work consistently 
with that choice. In SQL, in non-SQL compatible mode (i.e. 
`useDefaultValueForNull=true`, a `NULL` constant in SQL will be treated as 
either a blank string or zero, depending on the data type.
   
   This topic really deserves its own section or page, since it pretty much 
means that Druid is two different systems and the user must choose which to use 
at the first installation.



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by
+time. In a basic setup, Druid creates one segment file for each time
 interval, where the time interval is configurable in the
 `segmentGranularity` parameter of the
-[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).  For 
Druid to
-operate well under heavy query load, it is important for the segment
+[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).
+
+For Druid to operate well under heavy query load, it is important for the 
segment
 file size to be within the recommended range of 300MB-700MB. If your
 segment files are larger than this range, then consider either
 changing the granularity of the time interval or partitioning your
-data and tweaking the `targetRowsPerSegment` in your `partitionsSpec`
-(a good starting point for this parameter is 5 million rows).  See the
-sharding section below and the 'Partitioning specification' section of
+data and adjusting the `targetRowsPerSegment` in your `partitionsSpec`.
+A good starting point for this parameter is 5 million rows.
+
+See the Sharding section below and the 'Partitioning specification' section of
 the [Batch ingestion](../ingestion/hadoop.md#partitionsspec) documentation
-for more information.
+for more guidance.
 
-### A segment file's core data structures
+## Segment file structure
 
-Here we describe the internal structure of segment files, which is
-essentially *columnar*: the data for each column is laid out in
-separate data structures. By storing each column separately, Druid can
-decrease query latency by scanning only those columns actually needed
-for a query.  There are three basic column types: the timestamp
-column, dimension columns, and metric columns, as illustrated in the
-image below:
+Segment files are *columnar*: the data for each column is laid out in
+separate data structures. By storing each column separately, Druid decreases 
query latency by scanning only those columns actually needed for a query.  
There are three basic column types: timestamp, dimensions, and metrics:
 
 ![Druid column types](../assets/druid-column-types.png "Druid Column Types")
 
-The timestamp and metric columns are simple: behind the scenes each of
-these is an array of integer or floating point values compressed with
-LZ4. Once a query knows which rows it needs to select, it simply
-decompresses these, pulls out the relevant rows, and applies the
-desired aggregation operator. As with all columns, if a query doesn’t
-require a column, then that column’s data is just skipped over.
+Timestamp and metrics type columns are arrays of integer or floating point 
values compressed with
+[LZ4](https://github.com/lz4/lz4-java). Once a query identifies which rows to 
select, it decompresses them, pulls out the relevant rows, and applies the
+desired aggregation operator. If a query doesn’t require a column, Druid skips 
over that column's data.
 
-Dimensions columns are different because they support filter and
+Dimension columns are different because they support filter and
 group-by operations, so each dimension requires the following
 three data structures:
 
-1. A dictionary that maps values (which are always treated as strings) to 
integer IDs,
-2. A list of the column’s values, encoded using the dictionary in 1, and
-3. For each distinct value in the column, a bitmap that indicates which rows 
contain that value.
-
-
-Why these three data structures? The dictionary simply maps string
-values to integer ids so that the values in \(2\) and \(3\) can be
-represented compactly. The bitmaps in \(3\) -- also known as *inverted
-indexes* allow for quick filtering operations (specifically, bitmaps
-are convenient for quickly applying AND and OR operators). Finally,
-the list of values in \(2\) is needed for *group by* and *TopN*
-queries. In other words, queries that solely aggregate metrics based
-on filters do not need to touch the list of dimension values stored in \(2\).
+- Dictionary: Maps values (which are always treated as strings) to integer 
IDs, allowing compact representation of the list and bitmap values.
+- List: The column’s values, encoded using the dictionary. Required for 
GroupBy and TopN queries. These operators allow queries that solely aggregate 
metrics based on filters to run without accessing the list of values.
+- Bitmap: One bitmap for each distinct value in the column, to indicate which 
rows contain that value. Bitmaps allow for quick filtering operations because 
they are convenient for quickly applying AND and OR operators. Also known as 
inverted indexes.
 
-To get a concrete sense of these data structures, consider the ‘page’
-column from the example data above.  The three data structures that
-represent this dimension are illustrated in the diagram below.
+To get a better sense of these data structures, consider the ‘page’ column 
from the given example data as represented by the following data structures:
 
 ```
-1: Dictionary that encodes column values
-  {
+1: Dictionary
+   {
     "Justin Bieber": 0,
     "Ke$ha":         1
-  }
+   }
 
-2: Column data
-  [0,
+2: List of column data
+   [0,
    0,
    1,
    1]
 
-3: Bitmaps - one for each unique value of the column
-  value="Justin Bieber": [1,1,0,0]
-  value="Ke$ha":         [0,0,1,1]
+3: Bitmaps
+   value="Justin Bieber": [1,1,0,0]
+   value="Ke$ha":         [0,0,1,1]
 ```
 
-Note that the bitmap is different from the first two data structures:
-whereas the first two grow linearly in the size of the data (in the
-worst case), the size of the bitmap section is the product of data
-size * column cardinality. Compression will help us here though
-because we know that for each row in 'column data', there will only be a
-single bitmap that has non-zero entry. This means that high cardinality
-columns will have extremely sparse, and therefore highly compressible,
-bitmaps. Druid exploits this using compression algorithms that are
-specially suited for bitmaps, such as roaring bitmap compression.
+Note that the bitmap is different from the dictionary and list data 
structures: the dictionary and list grow linearly with the size of the data, 
but the size of the bitmap section is the product of data size * column 
cardinality. 

Review Comment:
   "Note that the" --> "The"
   
   I believe that the dictionary grows linearly with the *cardinality* of the 
data (number of unique values.)



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by
+time. In a basic setup, Druid creates one segment file for each time
 interval, where the time interval is configurable in the
 `segmentGranularity` parameter of the
-[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).  For 
Druid to
-operate well under heavy query load, it is important for the segment
+[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).
+
+For Druid to operate well under heavy query load, it is important for the 
segment
 file size to be within the recommended range of 300MB-700MB. If your
 segment files are larger than this range, then consider either
 changing the granularity of the time interval or partitioning your
-data and tweaking the `targetRowsPerSegment` in your `partitionsSpec`
-(a good starting point for this parameter is 5 million rows).  See the
-sharding section below and the 'Partitioning specification' section of
+data and adjusting the `targetRowsPerSegment` in your `partitionsSpec`.
+A good starting point for this parameter is 5 million rows.
+
+See the Sharding section below and the 'Partitioning specification' section of
 the [Batch ingestion](../ingestion/hadoop.md#partitionsspec) documentation
-for more information.
+for more guidance.
 
-### A segment file's core data structures
+## Segment file structure
 
-Here we describe the internal structure of segment files, which is
-essentially *columnar*: the data for each column is laid out in
-separate data structures. By storing each column separately, Druid can
-decrease query latency by scanning only those columns actually needed
-for a query.  There are three basic column types: the timestamp
-column, dimension columns, and metric columns, as illustrated in the
-image below:
+Segment files are *columnar*: the data for each column is laid out in
+separate data structures. By storing each column separately, Druid decreases 
query latency by scanning only those columns actually needed for a query.  
There are three basic column types: timestamp, dimensions, and metrics:
 
 ![Druid column types](../assets/druid-column-types.png "Druid Column Types")
 
-The timestamp and metric columns are simple: behind the scenes each of
-these is an array of integer or floating point values compressed with
-LZ4. Once a query knows which rows it needs to select, it simply
-decompresses these, pulls out the relevant rows, and applies the
-desired aggregation operator. As with all columns, if a query doesn’t
-require a column, then that column’s data is just skipped over.
+Timestamp and metrics type columns are arrays of integer or floating point 
values compressed with
+[LZ4](https://github.com/lz4/lz4-java). Once a query identifies which rows to 
select, it decompresses them, pulls out the relevant rows, and applies the
+desired aggregation operator. If a query doesn’t require a column, Druid skips 
over that column's data.
 
-Dimensions columns are different because they support filter and
+Dimension columns are different because they support filter and
 group-by operations, so each dimension requires the following
 three data structures:
 
-1. A dictionary that maps values (which are always treated as strings) to 
integer IDs,
-2. A list of the column’s values, encoded using the dictionary in 1, and
-3. For each distinct value in the column, a bitmap that indicates which rows 
contain that value.
-
-
-Why these three data structures? The dictionary simply maps string
-values to integer ids so that the values in \(2\) and \(3\) can be
-represented compactly. The bitmaps in \(3\) -- also known as *inverted
-indexes* allow for quick filtering operations (specifically, bitmaps
-are convenient for quickly applying AND and OR operators). Finally,
-the list of values in \(2\) is needed for *group by* and *TopN*
-queries. In other words, queries that solely aggregate metrics based
-on filters do not need to touch the list of dimension values stored in \(2\).
+- Dictionary: Maps values (which are always treated as strings) to integer 
IDs, allowing compact representation of the list and bitmap values.
+- List: The column’s values, encoded using the dictionary. Required for 
GroupBy and TopN queries. These operators allow queries that solely aggregate 
metrics based on filters to run without accessing the list of values.
+- Bitmap: One bitmap for each distinct value in the column, to indicate which 
rows contain that value. Bitmaps allow for quick filtering operations because 
they are convenient for quickly applying AND and OR operators. Also known as 
inverted indexes.
 
-To get a concrete sense of these data structures, consider the ‘page’
-column from the example data above.  The three data structures that
-represent this dimension are illustrated in the diagram below.
+To get a better sense of these data structures, consider the ‘page’ column 
from the given example data as represented by the following data structures:
 
 ```
-1: Dictionary that encodes column values
-  {
+1: Dictionary
+   {
     "Justin Bieber": 0,
     "Ke$ha":         1
-  }
+   }
 
-2: Column data
-  [0,
+2: List of column data
+   [0,
    0,
    1,
    1]
 
-3: Bitmaps - one for each unique value of the column
-  value="Justin Bieber": [1,1,0,0]
-  value="Ke$ha":         [0,0,1,1]
+3: Bitmaps
+   value="Justin Bieber": [1,1,0,0]
+   value="Ke$ha":         [0,0,1,1]
 ```
 
-Note that the bitmap is different from the first two data structures:
-whereas the first two grow linearly in the size of the data (in the
-worst case), the size of the bitmap section is the product of data
-size * column cardinality. Compression will help us here though
-because we know that for each row in 'column data', there will only be a
-single bitmap that has non-zero entry. This means that high cardinality
-columns will have extremely sparse, and therefore highly compressible,
-bitmaps. Druid exploits this using compression algorithms that are
-specially suited for bitmaps, such as roaring bitmap compression.
+Note that the bitmap is different from the dictionary and list data 
structures: the dictionary and list grow linearly with the size of the data, 
but the size of the bitmap section is the product of data size * column 
cardinality. 
+
+For each row in the list of column data, there is only a single bitmap that 
has a non-zero entry. This means that high cardinality columns have extremely 
sparse, and therefore highly compressible, bitmaps. Druid exploits this using 
compression algorithms that are specially suited for bitmaps, such as [Roaring 
bitmap compression](https://github.com/RoaringBitmap/RoaringBitmap).
+
+## Handling null values
+
+By default, Druid string dimension columns use the values `''` and `null` 
interchangeably and numeric and metric columns can not represent `null` at all, 
instead coercing nulls to `0`. However, Druid also provides a SQL compatible 
null handling mode, which you can enable at the system level, through 
`druid.generic.useDefaultValueForNull`. This setting, when set to `false`, 
allows Druid to create segments _at ingestion time_ in which the string columns 
can distinguish `''` from `null`, and numeric columns which can represent 
`null` valued rows instead of `0`.
+
+String dimension columns contain no additional column structures in this mode, 
instead they reserve an additional dictionary entry for the `null` value. 
Numeric columns are stored in the segment with an additional bitmap in which 
the set bits indicate `null` valued rows. 
+
+In addition to slightly increased segment sizes, SQL compatible null handling 
can incur a performance cost at query time, due to the need to check the null 
bitmap. This performance cost only occurs for columns that actually contain 
null values.
+
+## Segments with different schemas
+
+Druid segments for the same datasource may have different schemas. If a string 
column (dimension) exists in one segment but not another, queries that involve 
both segments still work. Queries for the segment without the dimension behave 
as if the dimension contains only null values. Similarly, if one segment has a 
numeric column (metric) but another does not, queries on the segment without 
the metric generally operate as expected. Aggregations over the missing metric 
operate as if the metric doesn't exist.
+
+## Column format
+
+Each column is stored as two parts:
+
+- A Jackson-serialized ColumnDescriptor.

Review Comment:
   Suggestion: back-quote `ColumnDescriptor`. Then, maybe explain it: A 
Jackson-serialized instance of the internal Druid 
[`ColumnDescriptor`](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/column/ColumnDescriptor.java)
 class. (Extra credit for a link to that class on Github.)



##########
docs/ingestion/compaction.md:
##########
@@ -82,7 +82,8 @@ If you configure query granularity in compaction to go from a 
finer granularity
 
 ### Dimension handling
 
-Apache Druid supports schema changes. Therefore, dimensions can be different 
across segments even if they are a part of the same data source. See [Different 
schemas among 
segments](../design/segments.md#different-schemas-among-segments). If the input 
segments have different dimensions, the resulting compacted segment include all 
dimensions of the input segments. 
+Apache Druid supports schema changes. Therefore, dimensions can be different 
across segments even if they are a part of the same data source. See [Different 
schemas among segments](../design/segments.md#segments-with-different-schemas). 
If the input segments have different dimensions, the resulting compacted 
segment include all dimensions of the input segments. 
+

Review Comment:
   "include all dimensions of the input segments" --> "includes the union of 
all columns across all the input segments"
   
   What happens if column `c` occurs in two input segments, but with differing 
types? Would be good to ask a Druid engineer what happens, then record that 
here.



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by
+time. In a basic setup, Druid creates one segment file for each time
 interval, where the time interval is configurable in the
 `segmentGranularity` parameter of the
-[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).  For 
Druid to
-operate well under heavy query load, it is important for the segment
+[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).
+
+For Druid to operate well under heavy query load, it is important for the 
segment
 file size to be within the recommended range of 300MB-700MB. If your
 segment files are larger than this range, then consider either
 changing the granularity of the time interval or partitioning your
-data and tweaking the `targetRowsPerSegment` in your `partitionsSpec`
-(a good starting point for this parameter is 5 million rows).  See the
-sharding section below and the 'Partitioning specification' section of
+data and adjusting the `targetRowsPerSegment` in your `partitionsSpec`.
+A good starting point for this parameter is 5 million rows.
+
+See the Sharding section below and the 'Partitioning specification' section of
 the [Batch ingestion](../ingestion/hadoop.md#partitionsspec) documentation
-for more information.
+for more guidance.
 
-### A segment file's core data structures
+## Segment file structure
 
-Here we describe the internal structure of segment files, which is
-essentially *columnar*: the data for each column is laid out in
-separate data structures. By storing each column separately, Druid can
-decrease query latency by scanning only those columns actually needed
-for a query.  There are three basic column types: the timestamp
-column, dimension columns, and metric columns, as illustrated in the
-image below:
+Segment files are *columnar*: the data for each column is laid out in
+separate data structures. By storing each column separately, Druid decreases 
query latency by scanning only those columns actually needed for a query.  
There are three basic column types: timestamp, dimensions, and metrics:
 
 ![Druid column types](../assets/druid-column-types.png "Druid Column Types")
 
-The timestamp and metric columns are simple: behind the scenes each of
-these is an array of integer or floating point values compressed with
-LZ4. Once a query knows which rows it needs to select, it simply
-decompresses these, pulls out the relevant rows, and applies the
-desired aggregation operator. As with all columns, if a query doesn’t
-require a column, then that column’s data is just skipped over.
+Timestamp and metrics type columns are arrays of integer or floating point 
values compressed with
+[LZ4](https://github.com/lz4/lz4-java). Once a query identifies which rows to 
select, it decompresses them, pulls out the relevant rows, and applies the
+desired aggregation operator. If a query doesn’t require a column, Druid skips 
over that column's data.
 
-Dimensions columns are different because they support filter and
+Dimension columns are different because they support filter and
 group-by operations, so each dimension requires the following
 three data structures:
 
-1. A dictionary that maps values (which are always treated as strings) to 
integer IDs,
-2. A list of the column’s values, encoded using the dictionary in 1, and
-3. For each distinct value in the column, a bitmap that indicates which rows 
contain that value.
-
-
-Why these three data structures? The dictionary simply maps string
-values to integer ids so that the values in \(2\) and \(3\) can be
-represented compactly. The bitmaps in \(3\) -- also known as *inverted
-indexes* allow for quick filtering operations (specifically, bitmaps
-are convenient for quickly applying AND and OR operators). Finally,
-the list of values in \(2\) is needed for *group by* and *TopN*
-queries. In other words, queries that solely aggregate metrics based
-on filters do not need to touch the list of dimension values stored in \(2\).
+- Dictionary: Maps values (which are always treated as strings) to integer 
IDs, allowing compact representation of the list and bitmap values.
+- List: The column’s values, encoded using the dictionary. Required for 
GroupBy and TopN queries. These operators allow queries that solely aggregate 
metrics based on filters to run without accessing the list of values.
+- Bitmap: One bitmap for each distinct value in the column, to indicate which 
rows contain that value. Bitmaps allow for quick filtering operations because 
they are convenient for quickly applying AND and OR operators. Also known as 
inverted indexes.
 
-To get a concrete sense of these data structures, consider the ‘page’
-column from the example data above.  The three data structures that
-represent this dimension are illustrated in the diagram below.
+To get a better sense of these data structures, consider the ‘page’ column 
from the given example data as represented by the following data structures:
 
 ```
-1: Dictionary that encodes column values
-  {
+1: Dictionary
+   {
     "Justin Bieber": 0,
     "Ke$ha":         1
-  }
+   }
 
-2: Column data
-  [0,
+2: List of column data
+   [0,
    0,
    1,
    1]
 
-3: Bitmaps - one for each unique value of the column
-  value="Justin Bieber": [1,1,0,0]
-  value="Ke$ha":         [0,0,1,1]
+3: Bitmaps
+   value="Justin Bieber": [1,1,0,0]
+   value="Ke$ha":         [0,0,1,1]
 ```
 
-Note that the bitmap is different from the first two data structures:
-whereas the first two grow linearly in the size of the data (in the
-worst case), the size of the bitmap section is the product of data
-size * column cardinality. Compression will help us here though
-because we know that for each row in 'column data', there will only be a
-single bitmap that has non-zero entry. This means that high cardinality
-columns will have extremely sparse, and therefore highly compressible,
-bitmaps. Druid exploits this using compression algorithms that are
-specially suited for bitmaps, such as roaring bitmap compression.
+Note that the bitmap is different from the dictionary and list data 
structures: the dictionary and list grow linearly with the size of the data, 
but the size of the bitmap section is the product of data size * column 
cardinality. 
+
+For each row in the list of column data, there is only a single bitmap that 
has a non-zero entry. This means that high cardinality columns have extremely 
sparse, and therefore highly compressible, bitmaps. Druid exploits this using 
compression algorithms that are specially suited for bitmaps, such as [Roaring 
bitmap compression](https://github.com/RoaringBitmap/RoaringBitmap).
+
+## Handling null values
+
+By default, Druid string dimension columns use the values `''` and `null` 
interchangeably and numeric and metric columns can not represent `null` at all, 
instead coercing nulls to `0`. However, Druid also provides a SQL compatible 
null handling mode, which you can enable at the system level, through 
`druid.generic.useDefaultValueForNull`. This setting, when set to `false`, 
allows Druid to create segments _at ingestion time_ in which the string columns 
can distinguish `''` from `null`, and numeric columns which can represent 
`null` valued rows instead of `0`.
+
+String dimension columns contain no additional column structures in this mode, 
instead they reserve an additional dictionary entry for the `null` value. 
Numeric columns are stored in the segment with an additional bitmap in which 
the set bits indicate `null` valued rows. 
+
+In addition to slightly increased segment sizes, SQL compatible null handling 
can incur a performance cost at query time, due to the need to check the null 
bitmap. This performance cost only occurs for columns that actually contain 
null values.
+
+## Segments with different schemas
+
+Druid segments for the same datasource may have different schemas. If a string 
column (dimension) exists in one segment but not another, queries that involve 
both segments still work. Queries for the segment without the dimension behave 
as if the dimension contains only null values. Similarly, if one segment has a 
numeric column (metric) but another does not, queries on the segment without 
the metric generally operate as expected. Aggregations over the missing metric 
operate as if the metric doesn't exist.
+
+## Column format
+
+Each column is stored as two parts:
+
+- A Jackson-serialized ColumnDescriptor.
+- The rest of the binary for the column.
+
+A ColumnDescriptor is an object that allows the use of Jackson's polymorphic 
deserialization to add new and interesting methods of serialization with 
minimal impact to the code. It consists of some metadata about the column (for 
example: type, whether it's multi-value) and a list of 
serialization/deserialization logic that can deserialize the rest of the binary.
 
 ### Multi-value columns
 
-If a data source makes use of multi-value columns, then the data
-structures within the segment files look a bit different. Let's
-imagine that in the example above, the second row were tagged with
-both the 'Ke$ha' *and* 'Justin Bieber' topics. In this case, the three
-data structures would now look as follows:
+If a data source uses multi-value columns, then the data structures within the 
segment files look a bit different. Let's imagine that in the example above, 
the second row is tagged with both the `Ke$ha` *and* `Justin Bieber` topics, as 
follows:
 
 ```
-1: Dictionary that encodes column values
-  {
+1: Dictionary
+   {
     "Justin Bieber": 0,
     "Ke$ha":         1
-  }
+   }
 
-2: Column data
-  [0,
-   [0,1],  <--Row value of multi-value column can have array of values
+2: List of column data
+   [0,
+   [0,1],  <--Row value in a multi-value column can contain an array of values
    1,
    1]
 
-3: Bitmaps - one for each unique value
-  value="Justin Bieber": [1,1,0,0]
-  value="Ke$ha":         [0,1,1,1]
+3: Bitmaps
+   value="Justin Bieber": [1,1,0,0]
+   value="Ke$ha":         [0,1,1,1]
                             ^
                             |
                             |
-    Multi-value column has multiple non-zero entries
+   Multi-value column contains multiple non-zero entries
 ```
 
-Note the changes to the second row in the column data and the Ke$ha
+Note the changes to the second row in the list of column data and the `Ke$ha`
 bitmap. If a row has more than one value for a column, its entry in
-the 'column data' is an array of values. Additionally, a row with *n*
-values in 'column data' will have *n* non-zero valued entries in
-bitmaps.
-
-## SQL Compatible Null Handling
-By default, Druid string dimension columns use the values `''` and `null` 
interchangeably and numeric and metric columns can not represent `null` at all, 
instead coercing nulls to `0`. However, Druid also provides a SQL compatible 
null handling mode, which must be enabled at the system level, through 
`druid.generic.useDefaultValueForNull`. This setting, when set to `false`, will 
allow Druid to _at ingestion time_ create segments whose string columns can 
distinguish `''` from `null`, and numeric columns which can represent `null` 
valued rows instead of `0`.
-
-String dimension columns contain no additional column structures in this mode, 
instead just reserving an additional dictionary entry for the `null` value. 
Numeric columns however will be stored in the segment with an additional bitmap 
whose set bits indicate `null` valued rows. In addition to slightly increased 
segment sizes, SQL compatible null handling can incur a performance cost at 
query time as well, due to the need to check the null bitmap. This performance 
cost only occurs for columns that actually contain nulls.
-
-## Naming Convention
-
-Identifiers for segments are typically constructed using the segment 
datasource, interval start time (in ISO 8601 format), interval end time (in ISO 
8601 format), and a version. If data is additionally sharded beyond a time 
range, the segment identifier will also contain a partition number.
-
-An example segment identifier may be:
-datasource_intervalStart_intervalEnd_version_partitionNum
-
-## Segment Components
+the list is an array of values. Additionally, a row with *n* values in the 
list has *n* non-zero valued entries in bitmaps.
 
-Behind the scenes, a segment is comprised of several files, listed below.
+## Compression
 
-* `version.bin`
+Druid uses LZ4 by default to compress blocks of values for string, long, 
float, and double columns. Druid uses Roaring to compress bitmaps for string 
columns and numeric null values. We recommend that you use these defaults 
unless you've experimented with your data and query patterns suggest that 
non-default options will perform better in your specific case. 
 
-    4 bytes representing the current segment version as an integer. E.g., for 
v9 segments, the version is 0x0, 0x0, 0x0, 0x9
+For bitmap in string columns, the differences between using Roaring and 
Concise are most pronounced for high cardinality columns. In this case, Roaring 
is substantially faster on filters that match a lot of values, but in some 
cases Concise can have a lower footprint due to the overhead of the Roaring 
format (but is still slower when a lot of values are matched). You configure 
compression at the segment level, not for individual columns. See 
[IndexSpec](../ingestion/ingestion-spec.md#indexspec) for more details.
 
-* `meta.smoosh`
+## Segment identification
 
-    A file with metadata (filenames and offsets) about the contents of the 
other `smoosh` files
+Segment identifiers typically contain the segment datasource, interval start 
time (in ISO 8601 format), interval end time (in ISO 8601 format), and version 
information. If data is additionally sharded beyond a time range, the segment 
identifier also contains a partition number:
 
-* `XXXXX.smoosh`
+`datasource_intervalStart_intervalEnd_version_partitionNum`
 
-    There are some number of these files, which are concatenated binary data
+### Segment ID examples
 
-    The `smoosh` files represent multiple files "smooshed" together in order 
to minimize the number of file descriptors that must be open to house the data. 
They are files of up to 2GB in size (to match the limit of a memory mapped 
ByteBuffer in Java). The `smoosh` files house individual files for each of the 
columns in the data as well as an `index.drd` file with extra metadata about 
the segment.
+The increasing partition numbers in the following segments indicate that 
multiple segments exist for the same interval:
 
-    There is also a special column called `__time` that refers to the time 
column of the segment. This will hopefully become less and less special as the 
code evolves, but for now it’s as special as my Mommy always told me I am.
+```
+foo_2015-01-01/2015-01-02_v1_0

Review Comment:
   Since `/` is the Unix/Linux directory path separator, the actual file name 
_probably_ uses an underscore. The format shown here may be for S3. The local 
file naming format (on my Mac) seems much different. Maybe get an update from 
the Druid folks?



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by
+time. In a basic setup, Druid creates one segment file for each time
 interval, where the time interval is configurable in the
 `segmentGranularity` parameter of the
-[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).  For 
Druid to
-operate well under heavy query load, it is important for the segment
+[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).
+
+For Druid to operate well under heavy query load, it is important for the 
segment
 file size to be within the recommended range of 300MB-700MB. If your
 segment files are larger than this range, then consider either
 changing the granularity of the time interval or partitioning your
-data and tweaking the `targetRowsPerSegment` in your `partitionsSpec`
-(a good starting point for this parameter is 5 million rows).  See the
-sharding section below and the 'Partitioning specification' section of
+data and adjusting the `targetRowsPerSegment` in your `partitionsSpec`.
+A good starting point for this parameter is 5 million rows.
+
+See the Sharding section below and the 'Partitioning specification' section of
 the [Batch ingestion](../ingestion/hadoop.md#partitionsspec) documentation
-for more information.
+for more guidance.
 
-### A segment file's core data structures
+## Segment file structure
 
-Here we describe the internal structure of segment files, which is
-essentially *columnar*: the data for each column is laid out in
-separate data structures. By storing each column separately, Druid can
-decrease query latency by scanning only those columns actually needed
-for a query.  There are three basic column types: the timestamp
-column, dimension columns, and metric columns, as illustrated in the
-image below:
+Segment files are *columnar*: the data for each column is laid out in
+separate data structures. By storing each column separately, Druid decreases 
query latency by scanning only those columns actually needed for a query.  
There are three basic column types: timestamp, dimensions, and metrics:
 
 ![Druid column types](../assets/druid-column-types.png "Druid Column Types")
 
-The timestamp and metric columns are simple: behind the scenes each of
-these is an array of integer or floating point values compressed with
-LZ4. Once a query knows which rows it needs to select, it simply
-decompresses these, pulls out the relevant rows, and applies the
-desired aggregation operator. As with all columns, if a query doesn’t
-require a column, then that column’s data is just skipped over.
+Timestamp and metrics type columns are arrays of integer or floating point 
values compressed with
+[LZ4](https://github.com/lz4/lz4-java). Once a query identifies which rows to 
select, it decompresses them, pulls out the relevant rows, and applies the
+desired aggregation operator. If a query doesn’t require a column, Druid skips 
over that column's data.
 
-Dimensions columns are different because they support filter and
+Dimension columns are different because they support filter and
 group-by operations, so each dimension requires the following
 three data structures:
 
-1. A dictionary that maps values (which are always treated as strings) to 
integer IDs,
-2. A list of the column’s values, encoded using the dictionary in 1, and
-3. For each distinct value in the column, a bitmap that indicates which rows 
contain that value.
-
-
-Why these three data structures? The dictionary simply maps string
-values to integer ids so that the values in \(2\) and \(3\) can be
-represented compactly. The bitmaps in \(3\) -- also known as *inverted
-indexes* allow for quick filtering operations (specifically, bitmaps
-are convenient for quickly applying AND and OR operators). Finally,
-the list of values in \(2\) is needed for *group by* and *TopN*
-queries. In other words, queries that solely aggregate metrics based
-on filters do not need to touch the list of dimension values stored in \(2\).
+- Dictionary: Maps values (which are always treated as strings) to integer 
IDs, allowing compact representation of the list and bitmap values.
+- List: The column’s values, encoded using the dictionary. Required for 
GroupBy and TopN queries. These operators allow queries that solely aggregate 
metrics based on filters to run without accessing the list of values.
+- Bitmap: One bitmap for each distinct value in the column, to indicate which 
rows contain that value. Bitmaps allow for quick filtering operations because 
they are convenient for quickly applying AND and OR operators. Also known as 
inverted indexes.
 
-To get a concrete sense of these data structures, consider the ‘page’
-column from the example data above.  The three data structures that
-represent this dimension are illustrated in the diagram below.
+To get a better sense of these data structures, consider the ‘page’ column 
from the given example data as represented by the following data structures:
 
 ```
-1: Dictionary that encodes column values
-  {
+1: Dictionary
+   {
     "Justin Bieber": 0,
     "Ke$ha":         1
-  }
+   }
 
-2: Column data
-  [0,
+2: List of column data
+   [0,
    0,
    1,
    1]
 
-3: Bitmaps - one for each unique value of the column
-  value="Justin Bieber": [1,1,0,0]
-  value="Ke$ha":         [0,0,1,1]
+3: Bitmaps
+   value="Justin Bieber": [1,1,0,0]
+   value="Ke$ha":         [0,0,1,1]
 ```
 
-Note that the bitmap is different from the first two data structures:
-whereas the first two grow linearly in the size of the data (in the
-worst case), the size of the bitmap section is the product of data
-size * column cardinality. Compression will help us here though
-because we know that for each row in 'column data', there will only be a
-single bitmap that has non-zero entry. This means that high cardinality
-columns will have extremely sparse, and therefore highly compressible,
-bitmaps. Druid exploits this using compression algorithms that are
-specially suited for bitmaps, such as roaring bitmap compression.
+Note that the bitmap is different from the dictionary and list data 
structures: the dictionary and list grow linearly with the size of the data, 
but the size of the bitmap section is the product of data size * column 
cardinality. 
+
+For each row in the list of column data, there is only a single bitmap that 
has a non-zero entry. This means that high cardinality columns have extremely 
sparse, and therefore highly compressible, bitmaps. Druid exploits this using 
compression algorithms that are specially suited for bitmaps, such as [Roaring 
bitmap compression](https://github.com/RoaringBitmap/RoaringBitmap).
+
+## Handling null values
+
+By default, Druid string dimension columns use the values `''` and `null` 
interchangeably and numeric and metric columns can not represent `null` at all, 
instead coercing nulls to `0`. However, Druid also provides a SQL compatible 
null handling mode, which you can enable at the system level, through 
`druid.generic.useDefaultValueForNull`. This setting, when set to `false`, 
allows Druid to create segments _at ingestion time_ in which the string columns 
can distinguish `''` from `null`, and numeric columns which can represent 
`null` valued rows instead of `0`.
+
+String dimension columns contain no additional column structures in this mode, 
instead they reserve an additional dictionary entry for the `null` value. 
Numeric columns are stored in the segment with an additional bitmap in which 
the set bits indicate `null` valued rows. 
+
+In addition to slightly increased segment sizes, SQL compatible null handling 
can incur a performance cost at query time, due to the need to check the null 
bitmap. This performance cost only occurs for columns that actually contain 
null values.
+
+## Segments with different schemas
+
+Druid segments for the same datasource may have different schemas. If a string 
column (dimension) exists in one segment but not another, queries that involve 
both segments still work. Queries for the segment without the dimension behave 
as if the dimension contains only null values. Similarly, if one segment has a 
numeric column (metric) but another does not, queries on the segment without 
the metric generally operate as expected. Aggregations over the missing metric 
operate as if the metric doesn't exist.
+
+## Column format
+
+Each column is stored as two parts:
+
+- A Jackson-serialized ColumnDescriptor.
+- The rest of the binary for the column.
+
+A ColumnDescriptor is an object that allows the use of Jackson's polymorphic 
deserialization to add new and interesting methods of serialization with 
minimal impact to the code. It consists of some metadata about the column (for 
example: type, whether it's multi-value) and a list of 
serialization/deserialization logic that can deserialize the rest of the binary.
 
 ### Multi-value columns
 
-If a data source makes use of multi-value columns, then the data
-structures within the segment files look a bit different. Let's
-imagine that in the example above, the second row were tagged with
-both the 'Ke$ha' *and* 'Justin Bieber' topics. In this case, the three
-data structures would now look as follows:
+If a data source uses multi-value columns, then the data structures within the 
segment files look a bit different. Let's imagine that in the example above, 
the second row is tagged with both the `Ke$ha` *and* `Justin Bieber` topics, as 
follows:
 
 ```
-1: Dictionary that encodes column values
-  {
+1: Dictionary
+   {
     "Justin Bieber": 0,
     "Ke$ha":         1
-  }
+   }
 
-2: Column data
-  [0,
-   [0,1],  <--Row value of multi-value column can have array of values
+2: List of column data
+   [0,
+   [0,1],  <--Row value in a multi-value column can contain an array of values
    1,
    1]
 
-3: Bitmaps - one for each unique value
-  value="Justin Bieber": [1,1,0,0]
-  value="Ke$ha":         [0,1,1,1]
+3: Bitmaps
+   value="Justin Bieber": [1,1,0,0]
+   value="Ke$ha":         [0,1,1,1]
                             ^
                             |
                             |
-    Multi-value column has multiple non-zero entries
+   Multi-value column contains multiple non-zero entries
 ```
 
-Note the changes to the second row in the column data and the Ke$ha
+Note the changes to the second row in the list of column data and the `Ke$ha`
 bitmap. If a row has more than one value for a column, its entry in
-the 'column data' is an array of values. Additionally, a row with *n*
-values in 'column data' will have *n* non-zero valued entries in
-bitmaps.
-
-## SQL Compatible Null Handling
-By default, Druid string dimension columns use the values `''` and `null` 
interchangeably and numeric and metric columns can not represent `null` at all, 
instead coercing nulls to `0`. However, Druid also provides a SQL compatible 
null handling mode, which must be enabled at the system level, through 
`druid.generic.useDefaultValueForNull`. This setting, when set to `false`, will 
allow Druid to _at ingestion time_ create segments whose string columns can 
distinguish `''` from `null`, and numeric columns which can represent `null` 
valued rows instead of `0`.
-
-String dimension columns contain no additional column structures in this mode, 
instead just reserving an additional dictionary entry for the `null` value. 
Numeric columns however will be stored in the segment with an additional bitmap 
whose set bits indicate `null` valued rows. In addition to slightly increased 
segment sizes, SQL compatible null handling can incur a performance cost at 
query time as well, due to the need to check the null bitmap. This performance 
cost only occurs for columns that actually contain nulls.
-
-## Naming Convention
-
-Identifiers for segments are typically constructed using the segment 
datasource, interval start time (in ISO 8601 format), interval end time (in ISO 
8601 format), and a version. If data is additionally sharded beyond a time 
range, the segment identifier will also contain a partition number.
-
-An example segment identifier may be:
-datasource_intervalStart_intervalEnd_version_partitionNum
-
-## Segment Components
+the list is an array of values. Additionally, a row with *n* values in the 
list has *n* non-zero valued entries in bitmaps.
 
-Behind the scenes, a segment is comprised of several files, listed below.
+## Compression
 
-* `version.bin`
+Druid uses LZ4 by default to compress blocks of values for string, long, 
float, and double columns. Druid uses Roaring to compress bitmaps for string 
columns and numeric null values. We recommend that you use these defaults 
unless you've experimented with your data and query patterns suggest that 
non-default options will perform better in your specific case. 
 
-    4 bytes representing the current segment version as an integer. E.g., for 
v9 segments, the version is 0x0, 0x0, 0x0, 0x9
+For bitmap in string columns, the differences between using Roaring and 
Concise are most pronounced for high cardinality columns. In this case, Roaring 
is substantially faster on filters that match a lot of values, but in some 
cases Concise can have a lower footprint due to the overhead of the Roaring 
format (but is still slower when a lot of values are matched). You configure 
compression at the segment level, not for individual columns. See 
[IndexSpec](../ingestion/ingestion-spec.md#indexspec) for more details.

Review Comment:
   "For bitmap in string columns" --> "For string column bitmaps"
   
   Also, "a lot of" --> "many" (twice)



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by
+time. In a basic setup, Druid creates one segment file for each time

Review Comment:
   Also, this sentence is a bit off. This actually has nothing to do with the 
"setup". It is actually the case that:
   
   > Druid creates a segment for each segment interval that has data. If an 
interval is empty (has no rows), then no segment will exist for that time 
interval. Druid may create multiple segments for the same interval if you 
ingest data for that period via different ingestion jobs. Compaction is the 
Druid process that attempts to combine these segments into a single one per 
segment interval for optimal performance.



##########
docs/design/segments.md:
##########
@@ -23,231 +23,198 @@ title: "Segments"
   -->
 
 
-Apache Druid stores its index in *segment files*, which are partitioned by
-time. In a basic setup, one segment file is created for each time
+Apache Druid stores its index in *segment files* partitioned by
+time. In a basic setup, Druid creates one segment file for each time
 interval, where the time interval is configurable in the
 `segmentGranularity` parameter of the
-[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).  For 
Druid to
-operate well under heavy query load, it is important for the segment
+[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).
+
+For Druid to operate well under heavy query load, it is important for the 
segment
 file size to be within the recommended range of 300MB-700MB. If your
 segment files are larger than this range, then consider either
 changing the granularity of the time interval or partitioning your
-data and tweaking the `targetRowsPerSegment` in your `partitionsSpec`
-(a good starting point for this parameter is 5 million rows).  See the
-sharding section below and the 'Partitioning specification' section of
+data and adjusting the `targetRowsPerSegment` in your `partitionsSpec`.
+A good starting point for this parameter is 5 million rows.
+
+See the Sharding section below and the 'Partitioning specification' section of
 the [Batch ingestion](../ingestion/hadoop.md#partitionsspec) documentation
-for more information.
+for more guidance.
 
-### A segment file's core data structures
+## Segment file structure
 
-Here we describe the internal structure of segment files, which is
-essentially *columnar*: the data for each column is laid out in
-separate data structures. By storing each column separately, Druid can
-decrease query latency by scanning only those columns actually needed
-for a query.  There are three basic column types: the timestamp
-column, dimension columns, and metric columns, as illustrated in the
-image below:
+Segment files are *columnar*: the data for each column is laid out in
+separate data structures. By storing each column separately, Druid decreases 
query latency by scanning only those columns actually needed for a query.  
There are three basic column types: timestamp, dimensions, and metrics:
 
 ![Druid column types](../assets/druid-column-types.png "Druid Column Types")
 
-The timestamp and metric columns are simple: behind the scenes each of
-these is an array of integer or floating point values compressed with
-LZ4. Once a query knows which rows it needs to select, it simply
-decompresses these, pulls out the relevant rows, and applies the
-desired aggregation operator. As with all columns, if a query doesn’t
-require a column, then that column’s data is just skipped over.
+Timestamp and metrics type columns are arrays of integer or floating point 
values compressed with
+[LZ4](https://github.com/lz4/lz4-java). Once a query identifies which rows to 
select, it decompresses them, pulls out the relevant rows, and applies the
+desired aggregation operator. If a query doesn’t require a column, Druid skips 
over that column's data.
 
-Dimensions columns are different because they support filter and
+Dimension columns are different because they support filter and
 group-by operations, so each dimension requires the following
 three data structures:
 
-1. A dictionary that maps values (which are always treated as strings) to 
integer IDs,
-2. A list of the column’s values, encoded using the dictionary in 1, and
-3. For each distinct value in the column, a bitmap that indicates which rows 
contain that value.
-
-
-Why these three data structures? The dictionary simply maps string
-values to integer ids so that the values in \(2\) and \(3\) can be
-represented compactly. The bitmaps in \(3\) -- also known as *inverted
-indexes* allow for quick filtering operations (specifically, bitmaps
-are convenient for quickly applying AND and OR operators). Finally,
-the list of values in \(2\) is needed for *group by* and *TopN*
-queries. In other words, queries that solely aggregate metrics based
-on filters do not need to touch the list of dimension values stored in \(2\).
+- Dictionary: Maps values (which are always treated as strings) to integer 
IDs, allowing compact representation of the list and bitmap values.
+- List: The column’s values, encoded using the dictionary. Required for 
GroupBy and TopN queries. These operators allow queries that solely aggregate 
metrics based on filters to run without accessing the list of values.
+- Bitmap: One bitmap for each distinct value in the column, to indicate which 
rows contain that value. Bitmaps allow for quick filtering operations because 
they are convenient for quickly applying AND and OR operators. Also known as 
inverted indexes.
 
-To get a concrete sense of these data structures, consider the ‘page’
-column from the example data above.  The three data structures that
-represent this dimension are illustrated in the diagram below.
+To get a better sense of these data structures, consider the ‘page’ column 
from the given example data as represented by the following data structures:
 
 ```
-1: Dictionary that encodes column values
-  {
+1: Dictionary
+   {
     "Justin Bieber": 0,
     "Ke$ha":         1
-  }
+   }
 
-2: Column data
-  [0,
+2: List of column data
+   [0,
    0,
    1,
    1]
 
-3: Bitmaps - one for each unique value of the column
-  value="Justin Bieber": [1,1,0,0]
-  value="Ke$ha":         [0,0,1,1]
+3: Bitmaps
+   value="Justin Bieber": [1,1,0,0]
+   value="Ke$ha":         [0,0,1,1]
 ```
 
-Note that the bitmap is different from the first two data structures:
-whereas the first two grow linearly in the size of the data (in the
-worst case), the size of the bitmap section is the product of data
-size * column cardinality. Compression will help us here though
-because we know that for each row in 'column data', there will only be a
-single bitmap that has non-zero entry. This means that high cardinality
-columns will have extremely sparse, and therefore highly compressible,
-bitmaps. Druid exploits this using compression algorithms that are
-specially suited for bitmaps, such as roaring bitmap compression.
+Note that the bitmap is different from the dictionary and list data 
structures: the dictionary and list grow linearly with the size of the data, 
but the size of the bitmap section is the product of data size * column 
cardinality. 
+
+For each row in the list of column data, there is only a single bitmap that 
has a non-zero entry. This means that high cardinality columns have extremely 
sparse, and therefore highly compressible, bitmaps. Druid exploits this using 
compression algorithms that are specially suited for bitmaps, such as [Roaring 
bitmap compression](https://github.com/RoaringBitmap/RoaringBitmap).
+
+## Handling null values
+
+By default, Druid string dimension columns use the values `''` and `null` 
interchangeably and numeric and metric columns can not represent `null` at all, 
instead coercing nulls to `0`. However, Druid also provides a SQL compatible 
null handling mode, which you can enable at the system level, through 
`druid.generic.useDefaultValueForNull`. This setting, when set to `false`, 
allows Druid to create segments _at ingestion time_ in which the string columns 
can distinguish `''` from `null`, and numeric columns which can represent 
`null` valued rows instead of `0`.
+
+String dimension columns contain no additional column structures in this mode, 
instead they reserve an additional dictionary entry for the `null` value. 
Numeric columns are stored in the segment with an additional bitmap in which 
the set bits indicate `null` valued rows. 
+
+In addition to slightly increased segment sizes, SQL compatible null handling 
can incur a performance cost at query time, due to the need to check the null 
bitmap. This performance cost only occurs for columns that actually contain 
null values.
+
+## Segments with different schemas
+
+Druid segments for the same datasource may have different schemas. If a string 
column (dimension) exists in one segment but not another, queries that involve 
both segments still work. Queries for the segment without the dimension behave 
as if the dimension contains only null values. Similarly, if one segment has a 
numeric column (metric) but another does not, queries on the segment without 
the metric generally operate as expected. Aggregations over the missing metric 
operate as if the metric doesn't exist.
+
+## Column format
+
+Each column is stored as two parts:
+
+- A Jackson-serialized ColumnDescriptor.
+- The rest of the binary for the column.

Review Comment:
   > The binary data for the column.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers commented on a diff in pull request #12344: Segments doc refactor

Reply via email to