LakshSingla commented on code in PR #14900: URL: https://github.com/apache/druid/pull/14900#discussion_r1345497227
########## processing/src/main/java/org/apache/druid/frame/field/NumericArrayFieldSelector.java: ########## @@ -0,0 +1,204 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.frame.field; + +import org.apache.datasketches.memory.Memory; +import org.apache.druid.error.DruidException; +import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector; +import org.apache.druid.segment.ColumnValueSelector; + +import javax.annotation.Nullable; +import java.util.ArrayList; +import java.util.List; + +/** + * Base implementation of the column value selector that the concrete numeric field reader implementations inherit from. + * The selector contains the logic to construct an array written by {@link NumericArrayFieldWriter}, and present it as + * a column value selector. + * + * The inheritors of this class are expected to implement + * 1. {@link #getIndividualValueAtMemory} Which extracts the element from the field where it was written to. Returns + * null if the value at that location represents a null element + * 2. {@link #getIndividualFieldSize} Which informs the method about the field size corresponding to each element in + * the numeric array's serialized representation + * + * @param <ElementType> Type of the individual array elements + */ +public abstract class NumericArrayFieldSelector<ElementType extends Number> implements ColumnValueSelector +{ + /** + * Memory containing the serialized values of the array + */ + protected final Memory memory; + + /** + * Pointer to location in the memory. The callers are expected to update the pointer's position to the start of the + * array that they wish to get prior to {@link #getObject()} call. + * + * Frames read and written using {@link org.apache.druid.frame.write.FrameWriter} and + * {@link org.apache.druid.frame.read.FrameReader} shouldn't worry about this detail, since they automatically update + * and handle the start location + */ + private final ReadableFieldPointer fieldPointer; + + /** + * Position last read, for caching the last fetched result + */ + private long currentFieldPosition = -1; + + /** + * Value of the row at the location beginning at {@link #currentFieldPosition} + */ + private final List<ElementType> currentRow = new ArrayList<>(); + + /** + * Nullity of the row at the location beginning at {@link #currentFieldPosition} + */ + private boolean currentRowIsNull; + + public NumericArrayFieldSelector(final Memory memory, final ReadableFieldPointer fieldPointer) + { + this.memory = memory; + this.fieldPointer = fieldPointer; + } + + @Override + public void inspectRuntimeShape(RuntimeShapeInspector inspector) + { + // Do nothing + } + + @Nullable + @Override + public Object getObject() + { + final List<ElementType> currentArray = computeCurrentArray(); Review Comment: > the array length after the null byte and wanting to spit out Object[]. So you are saying the format should be something like: Begin array indicator -> Array Elements -> Array terminator -> Size of array The size of the array will be helpful in preventing extra converts of `List <-> Object` while reading since we can directly allocate that much memory upfront. I never considered it, and it also doesn't mess with the comparison aspect of the fields, since we are going to end the comparison before that itself. However, since the array end is not known, we'd need to have a two-pass, where we find the rowTerminator first, and then figure out the size, allocate an array, and then re-read the elements from the bytes. So it's b/w - Two passes through the byte array, however without any List <-> Object[] conversion Or Single pass, requiring List<-> Object[] conversion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
