clintropolis commented on code in PR #14653:
URL: https://github.com/apache/druid/pull/14653#discussion_r1275831421
##########
processing/src/main/java/org/apache/druid/frame/read/columnar/StringFrameColumnReader.java:
##########
@@ -60,16 +61,25 @@
import javax.annotation.Nullable;
import java.nio.ByteBuffer;
-import java.util.ArrayList;
+import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
+/**
+ * Reader for {@link StringFrameColumnWriter}, types {@link ColumnType#STRING}
and {@link ColumnType#STRING_ARRAY}.
Review Comment:
at first glance it does seem kind of funny that strings and string arrays
live in the same reader, but makes sense because of mvds. Any idea what other
array frame columns might look like? I imagine most array columns would look
pretty similar, it might be nice to delegate some way to read elements
generically so we could just have a shared ArrayFrameColumnReader
##########
processing/src/main/java/org/apache/druid/segment/BaseObjectColumnValueSelector.java:
##########
@@ -29,12 +31,42 @@
* BaseObjectColumnValueSelector to make it impossible to accidently call any
method other than {@link #getObject()}.
*
* All implementations of this interface MUST also implement {@link
ColumnValueSelector}.
+ *
+ * Typically created by {@link
ColumnSelectorFactory#makeColumnValueSelector(String)}.
*/
@ExtensionPoint
public interface BaseObjectColumnValueSelector<T>
{
+ /**
+ * Returns the currently-selected object.
+ *
+ * The behavior of this method depends on the type of selector, which can be
determined by calling
+ * {@link ColumnSelectorFactory#getColumnCapabilities(String)} on the same
{@link ColumnSelectorFactory} that
+ * you got this selector from. If the capabilties are nonnull, the selector
type is given by
+ * {@link ColumnCapabilities#getType()}.
+ *
+ * String selectors, where type is {@link ColumnType#STRING}, may return any
type of object from this method,
+ * especially in cases where the selector is casting objects to string at
selection time. Callers are encouraged to
+ * avoid the need to deal with various objects by using {@link
ColumnSelectorFactory#makeDimensionSelector} instead.
+ *
+ * Numeric selectors, where {@link ColumnType#isNumeric()}, may return any
type of {@link Number}. Callers that
+ * wish to deal with more specific types should treat the original {@link
ColumnValueSelector} as a
+ * {@link BaseLongColumnValueSelector}, {@link
BaseDoubleColumnValueSelector}, or
+ * {@link BaseFloatColumnValueSelector} instead.
+ *
+ * Array selectors, where {@link ColumnType#isArray()}, must return {@code
Object[]}. The array may contain
+ * null elements, and the array itself may also be null.
+ *
+ * Selectors of unknown type, where {@link
ColumnSelectorFactory#getColumnCapabilities(String)} returns null,
+ * may return any type of object. Callers must be prepared for a wide
variety of possible input objects. This case
+ * is common during ingestion, where selectors are built on top of external
data.
+ */
Review Comment:
longer term I think i want to make a dedicated array selector with methods
to either retrieve the whole array as well as methods to select specific
elements by position. I think this might help reduce some of the ambiguity of
this method, since we can then transition away from using getObject for arrays
and reserve calling this method for complex types and other special cases. I'm
not fully sure what this looks like quite yet though, and I think this seems
like a good description for now to capture the current state of things :+1:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]