gortiz commented on code in PR #11453:
URL: https://github.com/apache/pinot/pull/11453#discussion_r1308277764
##########
pinot-common/src/main/java/org/apache/pinot/common/datablock/DataBlockUtils.java:
##########
@@ -275,43 +264,91 @@ private static Object[] extractRowFromDataBlock(DataBlock
dataBlock, int rowId,
* TODO: Add support for COLUMNAR format.
* @return int array of values in the column
*/
- public static int[] extractIntValuesForColumn(DataBlock dataBlock, int
columnIndex) {
+ public static int[] extractIntValuesForColumn(DataBlock dataBlock, int
colId) {
DataSchema dataSchema = dataBlock.getDataSchema();
- DataSchema.ColumnDataType[] columnDataTypes =
dataSchema.getColumnDataTypes();
-
- // Get null bitmap for the column.
- RoaringBitmap nullBitmap = extractNullBitmaps(dataBlock)[columnIndex];
+ ColumnDataType storedType =
dataSchema.getColumnDataType(colId).getStoredType();
+ RoaringBitmap nullBitmap = dataBlock.getNullRowIds(colId);
int numRows = dataBlock.getNumberOfRows();
-
- int[] rows = new int[numRows];
- for (int rowId = 0; rowId < numRows; rowId++) {
- if (nullBitmap != null && nullBitmap.contains(rowId)) {
- continue;
+ int[] values = new int[numRows];
+ if (nullBitmap == null) {
+ switch (storedType) {
+ case INT:
+ for (int rowId = 0; rowId < numRows; rowId++) {
+ values[rowId] = dataBlock.getInt(rowId, colId);
+ }
+ break;
+ case LONG:
+ for (int rowId = 0; rowId < numRows; rowId++) {
+ values[rowId] = (int) dataBlock.getLong(rowId, colId);
+ }
+ break;
+ case FLOAT:
+ for (int rowId = 0; rowId < numRows; rowId++) {
+ values[rowId] = (int) dataBlock.getFloat(rowId, colId);
+ }
+ break;
+ case DOUBLE:
+ for (int rowId = 0; rowId < numRows; rowId++) {
+ values[rowId] = (int) dataBlock.getDouble(rowId, colId);
+ }
+ break;
+ case BIG_DECIMAL:
+ for (int rowId = 0; rowId < numRows; rowId++) {
+ values[rowId] = dataBlock.getBigDecimal(rowId, colId).intValue();
+ }
+ break;
+ default:
+ throw new IllegalStateException(String.format("Cannot extract int
values for column: %s with stored type: %s",
+ dataSchema.getColumnName(colId), storedType));
}
-
- switch (columnDataTypes[columnIndex]) {
+ } else {
+ switch (storedType) {
case INT:
- case BOOLEAN:
- rows[rowId] = dataBlock.getInt(rowId, columnIndex);
+ for (int rowId = 0; rowId < numRows; rowId++) {
+ if (nullBitmap.contains(rowId)) {
+ continue;
+ }
+ values[rowId] = dataBlock.getInt(rowId, colId);
Review Comment:
nit: I know performance is not our priority and the fact that we are doing
boxing here is a bigger performance issue than what I'm going to say, but:
Depending on `numRows` it may be better to copy all values like in the not
nullable case and then do a second loop where we nullify the null specific
rows. Also, we can ask nullBitmap whether all values from `rowId` to `rowId +
numRows` are null. That should be a very fast operation in roaring bitmaps and
in case it happens, we can skip the whole loop.
Anyway, this is one of the places where we generate more garbage in V2. We
really need to refactor this code in the medium term
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]