clintropolis commented on code in PR #16708:
URL: https://github.com/apache/druid/pull/16708#discussion_r1669660800
##########
processing/src/main/java/org/apache/druid/segment/data/FixedIndexedWriter.java:
##########
@@ -46,14 +46,16 @@ public class FixedIndexedWriter<T> implements
DictionaryWriter<T>
private final Comparator<T> comparator;
private final ByteBuffer scratch;
private final ByteBuffer readBuffer;
- private int numWritten;
+ private final boolean isSorted;
+ private final int width;
+
+ private int cardinality = 0;
Review Comment:
nit: `FixedIndexed` doesn't really have to be sorted, this only seems like a
cardinality if it is sorted (and the other `Indexed` implementation writers
still using `numWritten`)
##########
processing/src/main/java/org/apache/druid/segment/data/FixedIndexedWriter.java:
##########
@@ -197,13 +202,8 @@ private void readPage()
{
iteratorBuffer.clear();
try {
- if (numWritten - (pos - startPos) < PAGE_SIZE) {
- int size = (numWritten - (pos - startPos)) * width;
- iteratorBuffer.limit(size);
- valuesOut.readFully((long) (pos - startPos) * width,
iteratorBuffer);
- } else {
- valuesOut.readFully((long) (pos - startPos) * width,
iteratorBuffer);
- }
+ iteratorBuffer.limit(Math.min(PAGE_SIZE, (cardinality - pos) *
width));
Review Comment:
cardinality includes the null value while numWritten did not (null values
are not stored in the values, rather as a flag), shouldn't this be checking
nulls? Additionally, it might make sense to adjust this computation to just
precompute a `final int nullAdjust = hasNulls ? 1 : 0` so we can just do
`cardinality - nullAdjust` in all the places that need to adjust
##########
processing/src/main/java/org/apache/druid/segment/loading/MMappedQueryableSegmentizerFactory.java:
##########
@@ -34,9 +35,29 @@
*/
public class MMappedQueryableSegmentizerFactory implements SegmentizerFactory
{
+ /**
+ * A static method to make a segmentizer factory that can be serialized by
Jackson. This exists because the object
+ * doesn't actually need the IndexIO object in order to be serialized by
Jackson, but the public constructor
+ * *does* require it. We leave the public constructor alone because if
indexIO is null, this SegmentizerFactory
+ * is largely useless, so we just create this one static method to enable
creating the object for this singular
+ * use case.
+ *
+ * @return a SegmentizerFactory that can be used to be serialized by Jackson
+ */
+ @SuppressWarnings("unused")
+ public static MMappedQueryableSegmentizerFactory makeForSerialization()
Review Comment:
nit: shouldn't this just be added for whatever needs it?
##########
processing/src/main/java/org/apache/druid/query/rowsandcols/column/LongArrayColumn.java:
##########
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.rowsandcols.column;
+
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.Numbers;
+import org.apache.druid.query.rowsandcols.util.FindResult;
+import org.apache.druid.segment.column.ColumnType;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.Arrays;
+
+public class LongArrayColumn implements Column
+{
+ private final long[] vals;
+
+ public LongArrayColumn(
+ long[] vals
+ )
+ {
+ this.vals = vals;
+ }
+
+ @Nonnull
+ @Override
+ public ColumnAccessor toAccessor()
+ {
+ return new MyColumnAccessor();
+ }
+
+ @Nullable
+ @SuppressWarnings("unchecked")
+ @Override
+ public <T> T as(Class<? extends T> clazz)
+ {
+ if (VectorCopier.class.equals(clazz)) {
+ return (T) (VectorCopier) (into, intoStart) -> {
+ if (Integer.MAX_VALUE - vals.length < intoStart) {
+ throw new ISE(
+ "too many rows!!! intoStart[%,d], vals.length[%,d] combine to
exceed max_int",
+ intoStart,
+ vals.length
+ );
+ }
+ for (int i = 0; i < vals.length; ++i) {
+ into[intoStart + i] = vals[i];
+ }
+ };
+ }
+ if (ColumnValueSwapper.class.equals(clazz)) {
+ return (T) (ColumnValueSwapper) (lhs, rhs) -> {
+ long tmp = vals[lhs];
+ vals[lhs] = vals[rhs];
+ vals[rhs] = tmp;
+ };
+ }
+ return null;
+ }
+
+ private class MyColumnAccessor implements BinarySearchableAccessor
+ {
+ @Override
+ public ColumnType getType()
+ {
+ return ColumnType.LONG;
+ }
+
+ @Override
+ public int numRows()
+ {
+ return vals.length;
+ }
+
+ @Override
+ public boolean isNull(int rowNum)
+ {
+ return false;
+ }
+
+ @Override
+ public Object getObject(int rowNum)
+ {
+ return vals[rowNum];
+ }
+
+ @Override
+ public double getDouble(int rowNum)
+ {
+ return vals[rowNum];
+ }
+
+ @Override
+ public float getFloat(int rowNum)
+ {
+ return vals[rowNum];
+ }
+
+ @Override
+ public long getLong(int rowNum)
+ {
+ return vals[rowNum];
+ }
+
+ @Override
+ public int getInt(int rowNum)
+ {
+ return (int) vals[rowNum];
+ }
+
+ @Override
+ public int compareRows(int lhsRowNum, int rhsRowNum)
+ {
+ return Long.compare(vals[lhsRowNum], vals[rhsRowNum]);
+ }
+
+
+ @Override
+ public FindResult findNull(int startIndex, int endIndex)
+ {
+ return FindResult.notFound(endIndex);
+ }
+
+ @Override
+ public FindResult findDouble(int startIndex, int endIndex, double val)
+ {
+ return findLong(startIndex, endIndex, (int) val);
Review Comment:
why cast `double` to `int` instead of `long`?
##########
processing/src/main/java/org/apache/druid/segment/data/VByte.java:
##########
@@ -58,6 +60,40 @@ public static int readInt(ByteBuffer buffer)
return v;
}
+ public static int writeInt(WritableByteChannel out, int val) throws
IOException
+ {
+ final byte[] bytes = new byte[5];
+ final int numBytes;
+ if (val < (1 << 7)) {
+ bytes[0] = (byte) (val | (1 << 7));
+ numBytes = 1;
+ } else if (val < (1 << 14)) {
+ bytes[0] = extract7bits(0, val);
+ bytes[1] = (byte) (extract7bitsmaskless(1, (val)) | (1 << 7));
+ numBytes = 2;
+ } else if (val < (1 << 21)) {
+ bytes[0] = extract7bits(0, val);
+ bytes[1] = extract7bits(1, val);
+ bytes[2] = (byte) (extract7bitsmaskless(2, (val)) | (1 << 7));
+ numBytes = 3;
+ } else if (val < (1 << 28)) {
+ bytes[0] = extract7bits(0, val);
+ bytes[1] = extract7bits(1, val);
+ bytes[2] = extract7bits(2, val);
+ bytes[3] = (byte) (extract7bitsmaskless(3, (val)) | (1 << 7));
+ numBytes = 4;
+ } else {
+ bytes[0] = extract7bits(0, val);
+ bytes[1] = extract7bits(1, val);
+ bytes[2] = extract7bits(2, val);
+ bytes[3] = extract7bits(3, val);
+ bytes[4] = (byte) (extract7bitsmaskless(4, (val)) | (1 << 7));
+ numBytes = 5;
+ }
+ out.write(ByteBuffer.wrap(bytes, 0, numBytes));
Review Comment:
hmm, is this really worth having a duplicate implementation instead of using
the other one that writes to a buffer?
##########
processing/src/main/java/org/apache/druid/query/rowsandcols/column/LongArrayColumn.java:
##########
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.rowsandcols.column;
+
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.Numbers;
+import org.apache.druid.query.rowsandcols.util.FindResult;
+import org.apache.druid.segment.column.ColumnType;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.Arrays;
+
+public class LongArrayColumn implements Column
+{
+ private final long[] vals;
+
+ public LongArrayColumn(
+ long[] vals
+ )
+ {
+ this.vals = vals;
+ }
+
+ @Nonnull
+ @Override
+ public ColumnAccessor toAccessor()
+ {
+ return new MyColumnAccessor();
+ }
+
+ @Nullable
+ @SuppressWarnings("unchecked")
+ @Override
+ public <T> T as(Class<? extends T> clazz)
+ {
+ if (VectorCopier.class.equals(clazz)) {
+ return (T) (VectorCopier) (into, intoStart) -> {
+ if (Integer.MAX_VALUE - vals.length < intoStart) {
+ throw new ISE(
+ "too many rows!!! intoStart[%,d], vals.length[%,d] combine to
exceed max_int",
+ intoStart,
+ vals.length
+ );
+ }
+ for (int i = 0; i < vals.length; ++i) {
+ into[intoStart + i] = vals[i];
+ }
+ };
+ }
+ if (ColumnValueSwapper.class.equals(clazz)) {
+ return (T) (ColumnValueSwapper) (lhs, rhs) -> {
+ long tmp = vals[lhs];
+ vals[lhs] = vals[rhs];
+ vals[rhs] = tmp;
+ };
+ }
+ return null;
+ }
+
+ private class MyColumnAccessor implements BinarySearchableAccessor
+ {
+ @Override
+ public ColumnType getType()
+ {
+ return ColumnType.LONG;
+ }
+
+ @Override
+ public int numRows()
+ {
+ return vals.length;
+ }
+
+ @Override
+ public boolean isNull(int rowNum)
+ {
+ return false;
+ }
+
+ @Override
+ public Object getObject(int rowNum)
+ {
+ return vals[rowNum];
+ }
+
+ @Override
+ public double getDouble(int rowNum)
+ {
+ return vals[rowNum];
+ }
+
+ @Override
+ public float getFloat(int rowNum)
+ {
+ return vals[rowNum];
+ }
+
+ @Override
+ public long getLong(int rowNum)
+ {
+ return vals[rowNum];
+ }
+
+ @Override
+ public int getInt(int rowNum)
+ {
+ return (int) vals[rowNum];
+ }
+
+ @Override
+ public int compareRows(int lhsRowNum, int rhsRowNum)
+ {
+ return Long.compare(vals[lhsRowNum], vals[rhsRowNum]);
+ }
+
+
+ @Override
+ public FindResult findNull(int startIndex, int endIndex)
+ {
+ return FindResult.notFound(endIndex);
+ }
+
+ @Override
+ public FindResult findDouble(int startIndex, int endIndex, double val)
+ {
+ return findLong(startIndex, endIndex, (int) val);
+ }
+
+ @Override
+ public FindResult findFloat(int startIndex, int endIndex, float val)
+ {
+ return findLong(startIndex, endIndex, (int) val);
+ }
+
+ @Override
+ public FindResult findLong(int startIndex, int endIndex, long val)
+ {
+ if (vals[startIndex] == val) {
+ int end = startIndex + 1;
+
+ while (end < endIndex && vals[end] == val) {
+ ++end;
+ }
+ return FindResult.found(startIndex, end);
+ }
+
+ int i = Arrays.binarySearch(vals, startIndex, endIndex, val);
Review Comment:
I know this isn't really new since the others do this too, but is there any
guarantee that `vals` is sorted?
##########
processing/src/main/java/org/apache/druid/query/rowsandcols/column/LongArrayColumn.java:
##########
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.rowsandcols.column;
+
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.Numbers;
+import org.apache.druid.query.rowsandcols.util.FindResult;
+import org.apache.druid.segment.column.ColumnType;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.Arrays;
+
+public class LongArrayColumn implements Column
+{
+ private final long[] vals;
+
+ public LongArrayColumn(
+ long[] vals
+ )
+ {
+ this.vals = vals;
+ }
+
+ @Nonnull
+ @Override
+ public ColumnAccessor toAccessor()
+ {
+ return new MyColumnAccessor();
+ }
+
+ @Nullable
+ @SuppressWarnings("unchecked")
+ @Override
+ public <T> T as(Class<? extends T> clazz)
+ {
+ if (VectorCopier.class.equals(clazz)) {
+ return (T) (VectorCopier) (into, intoStart) -> {
+ if (Integer.MAX_VALUE - vals.length < intoStart) {
+ throw new ISE(
+ "too many rows!!! intoStart[%,d], vals.length[%,d] combine to
exceed max_int",
+ intoStart,
+ vals.length
+ );
+ }
+ for (int i = 0; i < vals.length; ++i) {
+ into[intoStart + i] = vals[i];
+ }
+ };
+ }
+ if (ColumnValueSwapper.class.equals(clazz)) {
+ return (T) (ColumnValueSwapper) (lhs, rhs) -> {
+ long tmp = vals[lhs];
+ vals[lhs] = vals[rhs];
+ vals[rhs] = tmp;
+ };
+ }
+ return null;
+ }
+
+ private class MyColumnAccessor implements BinarySearchableAccessor
+ {
+ @Override
+ public ColumnType getType()
+ {
+ return ColumnType.LONG;
+ }
+
+ @Override
+ public int numRows()
+ {
+ return vals.length;
+ }
+
+ @Override
+ public boolean isNull(int rowNum)
+ {
+ return false;
+ }
+
+ @Override
+ public Object getObject(int rowNum)
+ {
+ return vals[rowNum];
+ }
+
+ @Override
+ public double getDouble(int rowNum)
+ {
+ return vals[rowNum];
+ }
+
+ @Override
+ public float getFloat(int rowNum)
+ {
+ return vals[rowNum];
+ }
+
+ @Override
+ public long getLong(int rowNum)
+ {
+ return vals[rowNum];
+ }
+
+ @Override
+ public int getInt(int rowNum)
+ {
+ return (int) vals[rowNum];
+ }
+
+ @Override
+ public int compareRows(int lhsRowNum, int rhsRowNum)
+ {
+ return Long.compare(vals[lhsRowNum], vals[rhsRowNum]);
+ }
+
+
+ @Override
+ public FindResult findNull(int startIndex, int endIndex)
+ {
+ return FindResult.notFound(endIndex);
+ }
+
+ @Override
+ public FindResult findDouble(int startIndex, int endIndex, double val)
+ {
+ return findLong(startIndex, endIndex, (int) val);
+ }
+
+ @Override
+ public FindResult findFloat(int startIndex, int endIndex, float val)
+ {
+ return findLong(startIndex, endIndex, (int) val);
+ }
+
+ @Override
+ public FindResult findLong(int startIndex, int endIndex, long val)
+ {
+ if (vals[startIndex] == val) {
+ int end = startIndex + 1;
+
+ while (end < endIndex && vals[end] == val) {
+ ++end;
+ }
+ return FindResult.found(startIndex, end);
+ }
+
+ int i = Arrays.binarySearch(vals, startIndex, endIndex, val);
+ if (i > 0) {
+ int foundStart = i;
+ int foundEnd = i + 1;
+
+ while (foundStart - 1 >= startIndex && vals[foundStart - 1] == val) {
+ --foundStart;
+ }
+
+ while (foundEnd < endIndex && vals[foundEnd] == val) {
+ ++foundEnd;
+ }
+
+ return FindResult.found(foundStart, foundEnd);
+ } else {
+ return FindResult.notFound(-(i + 1));
+ }
+ }
+
+ public FindResult findInt(int startIndex, int endIndex, int val)
+ {
+ return findLong(startIndex, endIndex, val);
+ }
+
+ @Override
+ public FindResult findString(int startIndex, int endIndex, String val)
+ {
+ return findLong(startIndex, endIndex, (int) Numbers.tryParseLong(val,
0));
+ }
+
+ @Override
+ public FindResult findComplex(int startIndex, int endIndex, Object val)
+ {
+ return findLong(startIndex, endIndex, Numbers.tryParseLong(val, 0));
Review Comment:
i know this isn't really new, but why doesn't this need to distinguish
between the case where `val` is parseable into a number and not?
##########
processing/src/main/java/org/apache/druid/query/rowsandcols/concrete/ColumnHolderRACColumn.java:
##########
@@ -91,7 +91,7 @@ public int numRows()
public boolean isNull(int rowNum)
{
offset.set(rowNum);
- return valueSelector.isNull();
+ return valueSelector.getObject() == null;
Review Comment:
does this have a different contract than `BaseNullableColumnValueSelector`?
Its `isNull` method means that `getLong`/`getFloat`/`getDouble` will return a
null value, not necessarily that `getObject` is null. E.g. sometimes
non-numeric selectors will implement these primitive getters, but not
necessarily be null rows.
Looking at some of the things that implement it, it doesn't appear to be
different, e.g. `ObjectColumnAccessorBase` has an implementation of `getLong`
that checks if the instance is a `String`, and if so, tries to parse it. This
means that `getObject` will not return null, but `getLong` isn't valid, so
there is no way to determine that the `0` it returns is actually a `null`
because we couldn't parse the string `'hello'` into a long.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]