[GitHub] [druid] vtlim commented on a diff in pull request #12808: Remove the time bit, fix headings

GitBox Wed, 20 Jul 2022 14:16:10 -0700


vtlim commented on code in PR #12808:
URL: https://github.com/apache/druid/pull/12808#discussion_r926059627



##########
docs/design/segments.md:
##########
@@ -193,9 +193,7 @@ A segment contains several files:
 
 * `XXXXX.smoosh`
 
-    A number of files containing concatenated binary data.
-
-    The `smoosh` files represent multiple files "smooshed" together in order 
to minimize the number of file descriptors that must be open to house the data. 
They are files of up to 2GB in size (to match the limit of a memory mapped 
ByteBuffer in Java). The `smoosh` files house individual files for each of the 
columns in the data as well as an `index.drd` file with extra metadata about 
the segment.
+    Smoosh (`.smoosh`) files contain concatenated binary data. This file 
consolidation reduces the number of file descriptors that must be open when 
accessing data. The files should be 2 GB or less in size to remain within the 
limit of a memory mapped `ByteBuffer` in Java. Smoosh files contain individual 
files for each column in the data and an `index.drd` file that contains 
additional segment metadata.
 
 Additionally, a column called `__time` refers to the time column of the 
segment.

Review Comment:
   This is supposed to relate to smoosh files, right? See suggestion above.



##########
docs/design/segments.md:
##########
@@ -193,9 +193,7 @@ A segment contains several files:
 
 * `XXXXX.smoosh`
 
-    A number of files containing concatenated binary data.
-
-    The `smoosh` files represent multiple files "smooshed" together in order 
to minimize the number of file descriptors that must be open to house the data. 
They are files of up to 2GB in size (to match the limit of a memory mapped 
ByteBuffer in Java). The `smoosh` files house individual files for each of the 
columns in the data as well as an `index.drd` file with extra metadata about 
the segment.
+    Smoosh (`.smoosh`) files contain concatenated binary data. This file 
consolidation reduces the number of file descriptors that must be open when 
accessing data. The files should be 2 GB or less in size to remain within the 
limit of a memory mapped `ByteBuffer` in Java. Smoosh files contain individual 
files for each column in the data and an `index.drd` file that contains 
additional segment metadata.

Review Comment:
   Another suggestion, to integrate the `__time` column:
   ```suggestion
   Smoosh (`.smoosh`) files contain concatenated binary data. This file 
consolidation reduces the number of file descriptors that must be open when 
accessing data. The files are 2 GB or less in size to remain within the limit 
of a memory-mapped `ByteBuffer` in Java. Smoosh files contain the following:
   * Individual files for each column in the data, including one for the 
`__time` column that refers to the time column of the segment
   * An `index.drd` file that contains additional segment metadata
   ```



##########
docs/design/segments.md:
##########
@@ -193,9 +193,7 @@ A segment contains several files:
 
 * `XXXXX.smoosh`
 
-    A number of files containing concatenated binary data.
-
-    The `smoosh` files represent multiple files "smooshed" together in order 
to minimize the number of file descriptors that must be open to house the data. 
They are files of up to 2GB in size (to match the limit of a memory mapped 
ByteBuffer in Java). The `smoosh` files house individual files for each of the 
columns in the data as well as an `index.drd` file with extra metadata about 
the segment.
+    Smoosh (`.smoosh`) files contain concatenated binary data. This file 
consolidation reduces the number of file descriptors that must be open when 
accessing data. The files should be 2 GB or less in size to remain within the 
limit of a memory mapped `ByteBuffer` in Java. Smoosh files contain individual 
files for each column in the data and an `index.drd` file that contains 
additional segment metadata.

Review Comment:
   ```suggestion
       Smoosh (`.smoosh`) files contain concatenated binary data. This file 
consolidation reduces the number of file descriptors that must be open when 
accessing data. The files are 2 GB or less in size to remain within the limit 
of a memory-mapped `ByteBuffer` in Java. Smoosh files contain individual files 
for each column in the data and an `index.drd` file that contains additional 
segment metadata.
   ```



##########
docs/design/segments.md:
##########
@@ -28,7 +28,7 @@ Apache Druid stores its data and indexes in *segment files* 
partitioned by time.
 The time interval is configurable in the `segmentGranularity` parameter of the 
[`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec).
 
 For Druid to operate well under heavy query load, it is important for the 
segment
-file size to be within the recommended range of 300MB-700MB. If your
+file size to be within the recommended range of 300 MB-700 MB. If your

Review Comment:
   ```suggestion
   file size to be within the recommended range of 300-700 MB. If your
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] vtlim commented on a diff in pull request #12808: Remove the time bit, fix headings

Reply via email to