vtlim commented on code in PR #12808:
URL: https://github.com/apache/druid/pull/12808#discussion_r926059627
##########
docs/design/segments.md:
##########
@@ -193,9 +193,7 @@ A segment contains several files:
* `XXXXX.smoosh`
- A number of files containing concatenated binary data.
-
- The `smoosh` files represent multiple files "smooshed" together in order
to minimize the number of file descriptors that must be open to house the data.
They are files of up to 2GB in size (to match the limit of a memory mapped
ByteBuffer in Java). The `smoosh` files house individual files for each of the
columns in the data as well as an `index.drd` file with extra metadata about
the segment.
+ Smoosh (`.smoosh`) files contain concatenated binary data. This file
consolidation reduces the number of file descriptors that must be open when
accessing data. The files should be 2 GB or less in size to remain within the
limit of a memory mapped `ByteBuffer` in Java. Smoosh files contain individual
files for each column in the data and an `index.drd` file that contains
additional segment metadata.
Additionally, a column called `__time` refers to the time column of the
segment.
Review Comment:
This is supposed to relate to smoosh files, right? See suggestion above.
##########
docs/design/segments.md:
##########
@@ -193,9 +193,7 @@ A segment contains several files:
* `XXXXX.smoosh`
- A number of files containing concatenated binary data.
-
- The `smoosh` files represent multiple files "smooshed" together in order
to minimize the number of file descriptors that must be open to house the data.
They are files of up to 2GB in size (to match the limit of a memory mapped
ByteBuffer in Java). The `smoosh` files house individual files for each of the
columns in the data as well as an `index.drd` file with extra metadata about
the segment.
+ Smoosh (`.smoosh`) files contain concatenated binary data. This file
consolidation reduces the number of file descriptors that must be open when
accessing data. The files should be 2 GB or less in size to remain within the
limit of a memory mapped `ByteBuffer` in Java. Smoosh files contain individual
files for each column in the data and an `index.drd` file that contains
additional segment metadata.
Review Comment:
Another suggestion, to integrate the `__time` column:
```suggestion
Smoosh (`.smoosh`) files contain concatenated binary data. This file
consolidation reduces the number of file descriptors that must be open when
accessing data. The files are 2 GB or less in size to remain within the limit
of a memory-mapped `ByteBuffer` in Java. Smoosh files contain the following:
* Individual files for each column in the data, including one for the
`__time` column that refers to the time column of the segment
* An `index.drd` file that contains additional segment metadata
```
##########
docs/design/segments.md:
##########
@@ -193,9 +193,7 @@ A segment contains several files:
* `XXXXX.smoosh`
- A number of files containing concatenated binary data.
-
- The `smoosh` files represent multiple files "smooshed" together in order
to minimize the number of file descriptors that must be open to house the data.
They are files of up to 2GB in size (to match the limit of a memory mapped
ByteBuffer in Java). The `smoosh` files house individual files for each of the
columns in the data as well as an `index.drd` file with extra metadata about
the segment.
+ Smoosh (`.smoosh`) files contain concatenated binary data. This file
consolidation reduces the number of file descriptors that must be open when
accessing data. The files should be 2 GB or less in size to remain within the
limit of a memory mapped `ByteBuffer` in Java. Smoosh files contain individual
files for each column in the data and an `index.drd` file that contains
additional segment metadata.
Review Comment:
```suggestion
Smoosh (`.smoosh`) files contain concatenated binary data. This file
consolidation reduces the number of file descriptors that must be open when
accessing data. The files are 2 GB or less in size to remain within the limit
of a memory-mapped `ByteBuffer` in Java. Smoosh files contain individual files
for each column in the data and an `index.drd` file that contains additional
segment metadata.
```
##########
docs/design/segments.md:
##########
@@ -28,7 +28,7 @@ Apache Druid stores its data and indexes in *segment files*
partitioned by time.
The time interval is configurable in the `segmentGranularity` parameter of the
[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).
For Druid to operate well under heavy query load, it is important for the
segment
-file size to be within the recommended range of 300MB-700MB. If your
+file size to be within the recommended range of 300 MB-700 MB. If your
Review Comment:
```suggestion
file size to be within the recommended range of 300-700 MB. If your
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]