This is an automated email from the ASF dual-hosted git repository.
cwylie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new 335ff3a66b0 switch default front-coding format to v1, drop
experimental from docs (#18984)
335ff3a66b0 is described below
commit 335ff3a66b01e5692b497d0e32313b030c3c03cd
Author: Clint Wylie <[email protected]>
AuthorDate: Thu Feb 19 08:56:00 2026 -0800
switch default front-coding format to v1, drop experimental from docs
(#18984)
---
docs/ingestion/ingestion-spec.md | 16 +++++-----------
docs/release-info/migr-front-coded-dict.md | 20 +++++++-------------
docs/release-info/release-notes.md | 13 -------------
docs/release-info/upgrade-notes.md | 10 ----------
.../apache/druid/segment/data/FrontCodedIndexed.java | 2 +-
.../segment/column/StringEncodingStrategyTest.java | 3 +--
6 files changed, 14 insertions(+), 50 deletions(-)
diff --git a/docs/ingestion/ingestion-spec.md b/docs/ingestion/ingestion-spec.md
index 1ba471fba8f..496c687bec1 100644
--- a/docs/ingestion/ingestion-spec.md
+++ b/docs/ingestion/ingestion-spec.md
@@ -569,16 +569,10 @@ For information on defining an `indexSpec` in a query
context, see [SQL-based in
#### Front coding
-:::info
-Front coding is an [experimental feature](../development/experimental.md).
-:::
-
-Druid encodes string columns into dictionaries for better compression.
-Front coding is an incremental encoding strategy that lets you store STRING
and [COMPLEX<json>](../querying/nested-columns.md) columns in Druid with
minimal performance impact.
-Front-coded dictionaries reduce storage and improve performance by optimizing
for strings where the front part looks similar.
+Druid stores STRING columns using dictionary encoding for better compression,
where each string value is added to a lexicographically sorted dictionary and
the actual column just stores a pointer to a dictionary entry.
+Front coding is an optional incremental encoding strategy that lets you
further compress STRING and
[COMPLEX<json>](../querying/nested-columns.md) columns in Druid with
minimal performance impact.
+Front-coded dictionaries can reduce storage and improve performance by
optimizing values with a shared common prefix to avoid storing duplicate data.
For example, if you are tracking website visits, most URLs start with
`https://domain.xyz/`, and front coding is able to exploit this pattern for
more optimal compression when storing such datasets.
-Druid performs the optimization automatically, which means that the
performance of string columns is generally not affected when they don't match
the front-coded pattern.
-Consequently, you can enable this feature universally without having to know
the underlying data shapes of the columns.
You can use front coding with all types of ingestion.
@@ -592,7 +586,7 @@ To enable front coding, set
`indexSpec.stringDictionaryEncoding.type` to `frontC
You can specify the following optional properties:
* `bucketSize`: Number of values to place in a bucket to perform delta
encoding. Setting this property instructs indexing tasks to write segments
using compressed dictionaries of the specified bucket size. You can set it to
any power of 2 less than or equal to 128. `bucketSize` defaults to 4.
-* `formatVersion`: Specifies which front coding version to use. Options are 0
and 1 (supported for Druid versions 26.0.0 and higher). `formatVersion`
defaults to 0. For faster speeds and smaller storage sizes, set `formatVersion`
to 1. After setting `formatVersion` to 1, you can no longer downgrade to Druid
25.0.0 seamlessly. To downgrade to Druid 25.0.0, you must re-ingest your data
with the `formatVersion` property set to 0.
+* `formatVersion`: Specifies which front coding version to use. Options are 0
and 1 (supported for Druid versions 26.0.0 and higher). `formatVersion`
defaults to 1.
For example:
@@ -602,7 +596,7 @@ For example:
"stringDictionaryEncoding": {
"type":"frontCoded",
"bucketSize": 4,
- "formatVersion": 0
+ "formatVersion": 1
}
}
}
diff --git a/docs/release-info/migr-front-coded-dict.md
b/docs/release-info/migr-front-coded-dict.md
index 4080d5f4709..5d8833f13b1 100644
--- a/docs/release-info/migr-front-coded-dict.md
+++ b/docs/release-info/migr-front-coded-dict.md
@@ -23,16 +23,10 @@ sidebar_label: Front-coded dictionaries
~ under the License.
-->
-:::info
-Front coding is an [experimental feature](../development/experimental.md)
introduced in Druid 25.0.0.
-:::
-
-Apache Druid encodes string columns into dictionaries for better compression.
-Front coding is an incremental encoding strategy that lets you store STRING
and [COMPLEX<json>](../querying/nested-columns.md) columns in Druid with
minimal performance impact.
-Front-coded dictionaries reduce storage and improve performance by optimizing
for strings where the front part looks similar.
+Apache Druid stores STRING columns using dictionary encoding for better
compression, where each string value is added to a lexicographically sorted
dictionary and the actual column just stores a pointer to a dictionary entry.
+Front coding is an optional incremental encoding strategy that lets you
further compress STRING and
[COMPLEX<json>](../querying/nested-columns.md) columns in Druid with
minimal performance impact.
+Front-coded dictionaries can reduce storage and improve performance by
optimizing values with a shared common prefix to avoid storing duplicate data.
For example, if you are tracking website visits, most URLs start with
`https://domain.xyz/`, and front coding is able to exploit this pattern for
more optimal compression when storing such datasets.
-Druid performs the optimization automatically, which means that the
performance of string columns is generally not affected when they don't match
the front-coded pattern.
-Consequently, you can enable this feature universally without having to know
the underlying data shapes of the columns.
You can use front coding with all types of ingestion.
@@ -43,7 +37,7 @@ To enable front coding, set
`indexSpec.stringDictionaryEncoding.type` to `frontC
You can specify the following optional properties:
* `bucketSize`: Number of values to place in a bucket to perform delta
encoding. Setting this property instructs indexing tasks to write segments
using compressed dictionaries of the specified bucket size. You can set it to
any power of 2 less than or equal to 128. `bucketSize` defaults to 4.
-* `formatVersion`: Specifies which front coding version to use. Options are 0
and 1 (supported for Druid versions 26.0.0 and higher). `formatVersion`
defaults to 0.
+* `formatVersion`: Specifies which front coding version to use. Options are 0
and 1 (the latter supported for Druid versions 26.0.0 and higher).
`formatVersion` defaults to 1.
For example:
@@ -53,7 +47,7 @@ For example:
"stringDictionaryEncoding": {
"type":"frontCoded",
"bucketSize": 4,
- "formatVersion": 0
+ "formatVersion": 1
}
}
}
@@ -86,8 +80,8 @@ For API calls to the SQL-based ingestion API, include the
`indexSpec` in the con
"stringDictionaryEncoding": {
"type": "frontCoded",
"bucketSize": 4,
- "formatVersion": 1}
- }
+ "formatVersion": 1
+ }
}
}
```
diff --git a/docs/release-info/release-notes.md
b/docs/release-info/release-notes.md
index d0573c5cbcc..7698573f110 100644
--- a/docs/release-info/release-notes.md
+++ b/docs/release-info/release-notes.md
@@ -97,19 +97,6 @@ This section contains detailed release notes separated by
areas.
### Upgrade notes
-#### Front-coded dictionaries
-
-<!--Carry this forward until 32. Then move it to incompatible changes -->
-
-In Druid 32.0.0, the front coded dictionaries feature will be turned on by
default. Front-coded dictionaries reduce storage and improve performance by
optimizing for strings where the front part looks similar.
-
-Once this feature is on, you cannot easily downgrade to an earlier version
that does not support the feature.
-
-For more information, see [Migration guide: front-coded
dictionaries](./migr-front-coded-dict.md).
-
-If you're already using this feature, you don't need to take any action.
-
-
### Incompatible changes
### Developer notes
diff --git a/docs/release-info/upgrade-notes.md
b/docs/release-info/upgrade-notes.md
index 440eb7d77ef..15b75867876 100644
--- a/docs/release-info/upgrade-notes.md
+++ b/docs/release-info/upgrade-notes.md
@@ -28,16 +28,6 @@ For the full release notes for a specific version, see the
[releases page](https
## Announcements
-#### Front-coded dictionaries
-
-Front-coded dictionaries reduce storage and improve performance by optimizing
for strings where the front part looks similar.
-
-Once this feature is on, you cannot easily downgrade to an earlier version
that does not support the feature.
-
-For more information, see [Migration guide: front-coded
dictionaries](./migr-front-coded-dict.md).
-
-If you're already using this feature, you don't need to take any action.
-
## 34.0.0
### Upgrade notes
diff --git
a/processing/src/main/java/org/apache/druid/segment/data/FrontCodedIndexed.java
b/processing/src/main/java/org/apache/druid/segment/data/FrontCodedIndexed.java
index bd541404b1d..31af80b415a 100644
---
a/processing/src/main/java/org/apache/druid/segment/data/FrontCodedIndexed.java
+++
b/processing/src/main/java/org/apache/druid/segment/data/FrontCodedIndexed.java
@@ -78,7 +78,7 @@ public abstract class FrontCodedIndexed implements
Indexed<ByteBuffer>
{
public static final byte V0 = 0;
public static final byte V1 = 1;
- public static final byte DEFAULT_VERSION = V0;
+ public static final byte DEFAULT_VERSION = V1;
public static final int DEFAULT_BUCKET_SIZE = 4;
public static byte validateVersion(byte version)
diff --git
a/processing/src/test/java/org/apache/druid/segment/column/StringEncodingStrategyTest.java
b/processing/src/test/java/org/apache/druid/segment/column/StringEncodingStrategyTest.java
index bd5dcfb39a4..e2b219c0525 100644
---
a/processing/src/test/java/org/apache/druid/segment/column/StringEncodingStrategyTest.java
+++
b/processing/src/test/java/org/apache/druid/segment/column/StringEncodingStrategyTest.java
@@ -54,8 +54,7 @@ public class StringEncodingStrategyTest
// this next assert seems silly, but its a sanity check to make us think
hard before changing the default version,
// to make us think of the backwards compatibility implications, as new
versions of segment format stuff cannot be
// downgraded to older versions of Druid and still read
- // the default version should be changed to V1 after Druid 26.0 is released
- Assert.assertEquals(FrontCodedIndexed.V0,
FrontCodedIndexed.DEFAULT_VERSION);
+ Assert.assertEquals(FrontCodedIndexed.V1,
FrontCodedIndexed.DEFAULT_VERSION);
}
@Test
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]