Jackie-Jiang commented on code in PR #8742:
URL: https://github.com/apache/pinot/pull/8742#discussion_r878653273
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java:
##########
@@ -76,6 +77,8 @@ byte[] toBytes()
String[] getStringArray(int rowId, int colId);
+ MutableRoaringBitmap getColumnNullBitmap(int colId);
Review Comment:
(minor) Suggest renaming to `getNullRowIds()`
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/DataSchema.java:
##########
@@ -173,6 +188,25 @@ public byte[] toBytes()
dataOutputStream.writeInt(bytes.length);
dataOutputStream.write(bytes);
}
+
+ // Write colum field specs.
+ // todo: consider serializing only defaultNullValues since this is the
only required info.
+ int fieldSpecsSize = _columnFieldSpecs == null ? 0 :
_columnFieldSpecs.length;
+ dataOutputStream.writeInt(fieldSpecsSize);
Review Comment:
(MAJOR) This is backward incompatible change. The new broker won't
understand the data schema returned from the old servers
##########
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/DistinctDataTableReducer.java:
##########
@@ -99,11 +99,11 @@ private ResultTable reduceToResultTable(DistinctTable
distinctTable) {
Iterator<Record> iterator = distinctTable.getFinalResult();
while (iterator.hasNext()) {
Object[] values = iterator.next().getValues();
- Object[] row = new Object[numColumns];
+ Object[] rowValues = new Object[numColumns];
Review Comment:
(minor) Seems unrelated
##########
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/GroupByDataTableReducer.java:
##########
@@ -136,23 +137,23 @@ private void reduceToResultTable(BrokerResponseNative
brokerResponseNative, Data
if (havingFilter != null) {
HavingFilterHandler havingFilterHandler = new
HavingFilterHandler(havingFilter, postAggregationHandler);
while (rows.size() < limit && sortedIterator.hasNext()) {
- Object[] row = sortedIterator.next().getValues();
- extractFinalAggregationResults(row);
+ Object[] values = sortedIterator.next().getValues();
Review Comment:
(minor) Suggest reverting it. I feel `row` is more specific, `values` is too
generic
##########
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/raw/BaseRawIntSingleColumnDistinctExecutor.java:
##########
@@ -29,35 +28,34 @@
import org.apache.pinot.core.data.table.Record;
import org.apache.pinot.core.query.distinct.DistinctExecutor;
import org.apache.pinot.core.query.distinct.DistinctTable;
-import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.apache.pinot.spi.data.FieldSpec;
/**
* Base implementation of {@link DistinctExecutor} for single raw INT column.
*/
abstract class BaseRawIntSingleColumnDistinctExecutor implements
DistinctExecutor {
final ExpressionContext _expression;
- final DataType _dataType;
+ final FieldSpec _fieldSpec;
final int _limit;
- final IntSet _valueSet;
+ final ObjectSet<Integer> _valueSet;
Review Comment:
This can cause performance regression. We should avoid adding overhead to
existing use cases. If we want to return `null` as a separate distinct value,
we can add a boolean flag `_hasNull`
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/BigDecimalColumnPreIndexStatsCollector.java:
##########
@@ -112,8 +113,13 @@ public int getCardinality() {
}
@Override
- public boolean hasNull() {
- return false;
+ public boolean hasNulls() {
Review Comment:
This API is not really used. We should rely on whether null value vector
exist and non-empty to identify whether a column contains null values
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/DataSchema.java:
##########
@@ -46,6 +49,7 @@
public class DataSchema {
private final String[] _columnNames;
private final ColumnDataType[] _columnDataTypes;
+ private final FieldSpec[] _columnFieldSpecs;
Review Comment:
Trying to understand why we need `FieldSpec` in the data schema. Seems the
intention is to use it to calculate the default null value.
It should not be stored here for the following reasons:
1. We don't want to carry the field spec in the transport object
2. The column could be an expression (transform function/aggregation
function etc), and there is no existing `FieldSpec` for them
3. We should not need to calculate default null value on the fly. Filling
any value should be fine because we should skip processing them
##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java:
##########
@@ -102,11 +110,17 @@ public DataTableBuilder(DataSchema dataSchema) {
}
public static DataTable getEmptyDataTable() {
- return _version == VERSION_2 ? new DataTableImplV2() : new
DataTableImplV3();
+ if (_version == VERSION_2) {
+ return new DataTableImplV2();
+ }
+ if (_version == VERSION_4) {
Review Comment:
(minor) Suggest checking `VERSION_3` for readability, same for `build()`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]