gianm commented on code in PR #17762:
URL: https://github.com/apache/druid/pull/17762#discussion_r2354498651
##########
processing/src/main/java/org/apache/druid/segment/nested/NestedCommonFormatColumn.java:
##########
@@ -125,10 +132,11 @@ public ColumnFormat merge(@Nullable ColumnFormat
otherFormat)
if (otherFormat instanceof Format) {
final Format other = (Format) otherFormat;
+ // todo (clint): actually merge columnFormatSpec, maybe
Review Comment:
please don't commit todos. Should this be done now or later?
##########
processing/src/main/java/org/apache/druid/segment/nested/NestedCommonFormatColumnFormatSpec.java:
##########
@@ -0,0 +1,338 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.nested;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.segment.IndexSpec;
+import org.apache.druid.segment.column.StringEncodingStrategy;
+import org.apache.druid.segment.data.BitmapSerdeFactory;
+import org.apache.druid.segment.data.CompressionFactory;
+import org.apache.druid.segment.data.CompressionStrategy;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+public class NestedCommonFormatColumnFormatSpec
Review Comment:
It'd be nice to have a javadoc for this class talking about how it's
expected to be used.
##########
processing/src/main/java/org/apache/druid/segment/AutoTypeColumnSchema.java:
##########
@@ -84,6 +95,9 @@ public AutoTypeColumnSchema(
} else {
this.castToType = castToType;
}
+ this.columnFormatSpec = columnFormatSpec == null
Review Comment:
IMO, the user provided object should be merged with the server-wide default
object at the level of individual keys. If you set `objectStorageCompression`
here in the schema, it shouldn't override a server-wide default that was
specified for `stringDictionaryEncoding`.
##########
processing/src/main/java/org/apache/druid/segment/nested/NestedCommonFormatColumnFormatSpec.java:
##########
@@ -0,0 +1,338 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.nested;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.segment.IndexSpec;
+import org.apache.druid.segment.column.StringEncodingStrategy;
+import org.apache.druid.segment.data.BitmapSerdeFactory;
+import org.apache.druid.segment.data.CompressionFactory;
+import org.apache.druid.segment.data.CompressionStrategy;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+public class NestedCommonFormatColumnFormatSpec
+{
+ public static Builder builder()
+ {
+ return new Builder();
+ }
+
+ public static NestedCommonFormatColumnFormatSpec getEffectiveFormatSpec(
+ @Nullable NestedCommonFormatColumnFormatSpec columnFormatSpec,
+ IndexSpec indexSpec
+ )
+ {
+ return Objects.requireNonNullElseGet(
+ columnFormatSpec,
+ () -> NestedCommonFormatColumnFormatSpec.builder()
+
.setObjectFieldsDictionaryEncoding(StringEncodingStrategy.UTF8_STRATEGY)
+
.setObjectStorageEncoding(ObjectStorageEncoding.SMILE)
+ .build()
+ ).getEffectiveSpec(indexSpec);
+ }
+
+ @Nullable
+ @JsonProperty
+ private final StringEncodingStrategy objectFieldsDictionaryEncoding;
+ @Nullable
+ @JsonProperty
+ private final ObjectStorageEncoding objectStorageEncoding;
+ @Nullable
+ @JsonProperty
+ private final CompressionStrategy objectStorageCompression;
+ @Nullable
+ @JsonProperty
+ private final StringEncodingStrategy stringDictionaryEncoding;
+ @Nullable
+ @JsonProperty
+ private final CompressionStrategy dictionaryEncodedColumnCompression;
+ @Nullable
+ @JsonProperty
+ private final CompressionFactory.LongEncodingStrategy longColumnEncoding;
+ @Nullable
+ @JsonProperty
+ private final CompressionStrategy longColumnCompression;
+ @Nullable
+ @JsonProperty
+ private final CompressionStrategy doubleColumnCompression;
+ @Nullable
+ @JsonProperty
+ private final BitmapSerdeFactory bitmapEncoding;
+
+ @JsonCreator
+ public NestedCommonFormatColumnFormatSpec(
+ @JsonProperty("objectFieldsDictionaryEncoding") @Nullable
StringEncodingStrategy objectFieldsDictionaryEncoding,
+ @JsonProperty("objectStorageEncoding") @Nullable ObjectStorageEncoding
objectStorageEncoding,
+ @JsonProperty("objectStorageCompression") @Nullable CompressionStrategy
objectStorageCompression,
+ @JsonProperty("stringDictionaryEncoding") @Nullable
StringEncodingStrategy stringDictionaryEncoding,
+ @JsonProperty("dictionaryEncodedColumnCompression") @Nullable
CompressionStrategy dictionaryEncodedColumnCompression,
+ @JsonProperty("longColumnEncoding") @Nullable
CompressionFactory.LongEncodingStrategy longColumnEncoding,
+ @JsonProperty("longColumnCompression") @Nullable CompressionStrategy
longColumnCompression,
+ @JsonProperty("doubleColumnCompression") @Nullable CompressionStrategy
doubleColumnCompression,
+ @JsonProperty("bitmapEncoding") @Nullable BitmapSerdeFactory
bitmapEncoding
+ )
+ {
+ this.objectFieldsDictionaryEncoding = objectFieldsDictionaryEncoding;
+ this.objectStorageEncoding = objectStorageEncoding;
+ this.objectStorageCompression = objectStorageCompression;
+ this.stringDictionaryEncoding = stringDictionaryEncoding;
+ this.dictionaryEncodedColumnCompression =
dictionaryEncodedColumnCompression;
+ this.longColumnEncoding = longColumnEncoding;
+ this.longColumnCompression = longColumnCompression;
+ this.doubleColumnCompression = doubleColumnCompression;
+ this.bitmapEncoding = bitmapEncoding;
+ }
+
+ public NestedCommonFormatColumnFormatSpec getEffectiveSpec(IndexSpec
indexSpec)
+ {
+ if (bitmapEncoding != null &&
!bitmapEncoding.equals(indexSpec.getBitmapSerdeFactory())) {
Review Comment:
Why let the `bitmapEncoding` be set at all, if it can only have one valid
value?
##########
processing/src/main/java/org/apache/druid/segment/AutoTypeColumnIndexer.java:
##########
@@ -793,10 +799,11 @@ public ColumnFormat merge(@Nullable ColumnFormat
otherFormat)
}
if (otherFormat instanceof Format) {
final Format other = (Format) otherFormat;
+ // todo (clint): actually merge column format spec, maybe?
Review Comment:
please don't commit todos
##########
processing/src/main/java/org/apache/druid/segment/nested/NestedCommonFormatColumnFormatSpec.java:
##########
@@ -0,0 +1,338 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.nested;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.segment.IndexSpec;
+import org.apache.druid.segment.column.StringEncodingStrategy;
+import org.apache.druid.segment.data.BitmapSerdeFactory;
+import org.apache.druid.segment.data.CompressionFactory;
+import org.apache.druid.segment.data.CompressionStrategy;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+public class NestedCommonFormatColumnFormatSpec
+{
+ public static Builder builder()
+ {
+ return new Builder();
+ }
+
+ public static NestedCommonFormatColumnFormatSpec getEffectiveFormatSpec(
+ @Nullable NestedCommonFormatColumnFormatSpec columnFormatSpec,
+ IndexSpec indexSpec
+ )
+ {
+ return Objects.requireNonNullElseGet(
+ columnFormatSpec,
+ () -> NestedCommonFormatColumnFormatSpec.builder()
+
.setObjectFieldsDictionaryEncoding(StringEncodingStrategy.UTF8_STRATEGY)
+
.setObjectStorageEncoding(ObjectStorageEncoding.SMILE)
+ .build()
+ ).getEffectiveSpec(indexSpec);
+ }
+
+ @Nullable
+ @JsonProperty
+ private final StringEncodingStrategy objectFieldsDictionaryEncoding;
+ @Nullable
+ @JsonProperty
+ private final ObjectStorageEncoding objectStorageEncoding;
+ @Nullable
+ @JsonProperty
+ private final CompressionStrategy objectStorageCompression;
+ @Nullable
+ @JsonProperty
+ private final StringEncodingStrategy stringDictionaryEncoding;
+ @Nullable
+ @JsonProperty
+ private final CompressionStrategy dictionaryEncodedColumnCompression;
+ @Nullable
+ @JsonProperty
+ private final CompressionFactory.LongEncodingStrategy longColumnEncoding;
+ @Nullable
+ @JsonProperty
+ private final CompressionStrategy longColumnCompression;
+ @Nullable
+ @JsonProperty
+ private final CompressionStrategy doubleColumnCompression;
+ @Nullable
+ @JsonProperty
Review Comment:
Seems strange to have `@JsonProperty` on all of the fields, the constructor,
and the getters. i would think just constructor and getters is fine? Or just
fields?
##########
processing/src/main/java/org/apache/druid/segment/DefaultColumnFormatConfig.java:
##########
@@ -69,17 +77,30 @@ private static void
validateMultiValueHandlingMode(@Nullable String stringMultiV
@JsonProperty("stringMultiValueHandlingMode")
private final String stringMultiValueHandlingMode;
+ @Nullable
+ @JsonProperty
+ private final NestedCommonFormatColumnFormatSpec nestedColumnFormatSpec;
+
@JsonCreator
public DefaultColumnFormatConfig(
+ @JsonProperty("stringMultiValueHandlingMode") @Nullable String
stringMultiValueHandlingMode,
@JsonProperty("nestedColumnFormatVersion") @Nullable Integer
nestedColumnFormatVersion,
- @JsonProperty("stringMultiValueHandlingMode") @Nullable String
stringMultiValueHandlingMode
+ @JsonProperty("nestedColumnFormatSpec") @Nullable
NestedCommonFormatColumnFormatSpec nestedColumnFormatSpec
Review Comment:
If I read the code right, this applies to both nested and auto typed
columns. If that's right, the name is misleading. Also, I wonder why this
should be part of DefaultColumnFormatConfig, rather than IndexSpec? What I'm
suggesting is like:
1. IndexSpec remains the place where default encoding/compression/etc stuff
is specified. It seems useful to be able to set server-wide defaults, and IIRC
there isn't currently a way to set a server-wide default IndexSpec. But, I
don't see why there shouldn't be.
2. You can still override nestedColumnFormatSpec at the level of
`NestedDataColumnSchema` and `AutoTypeColumnSchema`.
3. Maybe rename `nestedColumnFormatSpec` to `autoTypeColumnSpec`? To me that
seems more descriptive, since all nested columns are auto typed, but all auto
typed columns are not necessarily nested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]