cecemei commented on code in PR #18722: URL: https://github.com/apache/druid/pull/18722#discussion_r2604049544
########## processing/src/main/java/org/apache/druid/segment/column/BitmapIndexEncodingStrategy.java: ########## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.segment.column; + +import com.fasterxml.jackson.annotation.JsonSubTypes; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import org.apache.druid.collections.bitmap.BitmapFactory; +import org.apache.druid.collections.bitmap.ImmutableBitmap; +import org.apache.druid.collections.bitmap.MutableBitmap; +import org.apache.druid.error.DruidException; +import org.apache.druid.segment.data.GenericIndexedWriter; +import org.apache.druid.segment.file.SegmentFileBuilder; +import org.apache.druid.segment.serde.Serializer; + +import javax.annotation.Nullable; +import java.io.IOException; +import java.nio.channels.WritableByteChannel; +import java.util.Objects; + +@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type") +@JsonSubTypes({ + @JsonSubTypes.Type(value = BitmapIndexEncodingStrategy.DictionaryId.class, name = "dictionaryId"), + @JsonSubTypes.Type(value = BitmapIndexEncodingStrategy.NullsOnly.class, name = "nullsOnly") +}) +public abstract class BitmapIndexEncodingStrategy implements Serializer +{ + /** + * Assigned in {@link #init(BitmapFactory, int)} + */ + @Nullable + protected MutableBitmap[] bitmaps; + /** + * Assigned in {@link #close(BitmapFactory, GenericIndexedWriter)} + */ + @Nullable + GenericIndexedWriter<ImmutableBitmap> writer; + + public abstract void init(BitmapFactory bitmapFactory, int dictionarySize); + + public abstract void add(int row, int sortedId, @Nullable Object o); + + public void close(BitmapFactory bitmapFactory, GenericIndexedWriter<ImmutableBitmap> writer) throws IOException Review Comment: renamed to finalize ########## processing/src/main/java/org/apache/druid/segment/nested/GlobalDictionaryEncodedFieldColumnWriter.java: ########## @@ -187,7 +189,11 @@ long getSerializedColumnSize() throws IOException * Defines how to write the column, including the dictionary id column, along with any additional columns * such as the long or double value column as type appropriate. */ - abstract void writeColumnTo(WritableByteChannel channel, SegmentFileBuilder fileBuilder) throws IOException; + void writeColumnTo(WritableByteChannel channel, SegmentFileBuilder fileBuilder) throws IOException + { + writeLongAndDoubleColumnLength(channel, 0, 0); + encodedValueSerializer.writeTo(channel, fileBuilder); + } Review Comment: i was jumping through classes to compare `getSerializedColumnSize` and `writeColumnTo` and thought it'd be nice if inside the abstraction class these could all match. so like either overwrite all or overwrite none. ########## processing/src/main/java/org/apache/druid/segment/nested/GlobalDictionaryEncodedFieldColumnWriter.java: ########## @@ -272,21 +272,21 @@ public long getSerializedSize() throws IOException } else { arraySize = 0; } - return 1 + Integer.BYTES + + return 1 + Integer.BYTES + // version + feature flags sortedDictionaryWriter.getSerializedSize() + - bitmapIndexWriter.getSerializedSize() + - arraySize + - getSerializedColumnSize(); + getSerializedColumnSize() + + bitmapIndexEncoding.getSerializedSize() + + arraySize; } @Override public void writeTo(WritableByteChannel channel, SegmentFileBuilder fileBuilder) throws IOException { Channels.writeFully(channel, ByteBuffer.wrap(new byte[]{version.asByte()})); - channel.write(ByteBuffer.wrap(Ints.toByteArray(flags))); + Channels.writeFully(channel, ByteBuffer.wrap(Ints.toByteArray(flags))); Review Comment: no just for consistency with line above ########## processing/src/main/java/org/apache/druid/segment/nested/NestedCommonFormatColumnFormatSpec.java: ########## @@ -172,14 +184,18 @@ public static NestedCommonFormatColumnFormatSpec getEffectiveFormatSpec( private final CompressionStrategy doubleColumnCompression; @Nullable private final BitmapSerdeFactory bitmapEncoding; + @Nullable + private final BitmapIndexEncodingStrategy numericFieldsBitmapIndexEncoding; @JsonCreator public NestedCommonFormatColumnFormatSpec( @JsonProperty("objectFieldsDictionaryEncoding") @Nullable StringEncodingStrategy objectFieldsDictionaryEncoding, + @JsonProperty("numericFieldsBitmapIndexEncoding") @Nullable BitmapIndexEncodingStrategy numericFieldsBitmapIndexEncoding, Review Comment: i was thinking more granular controls by field names, with numeric as default. kinda feel like nothing in particular that double and long might be different? adding `longFieldIndexType` and `doubleFieldIndexType` is not too much work either, can do that if you have strong opinions on this. rn i renamed to `numericFieldsBitmapIndexType` and `BitmapIndexType`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
