kfaraz commented on code in PR #13303:
URL: https://github.com/apache/druid/pull/13303#discussion_r1065327862
##########
core/src/main/java/org/apache/druid/indexer/partitions/SingleDimensionPartitionsSpec.java:
##########
@@ -146,6 +154,44 @@ public String
getForceGuaranteedRollupIncompatiblityReason()
return FORCE_GUARANTEED_ROLLUP_COMPATIBLE;
}
+ @Override
+ public List<List<String>> getDimensionGroupingSet(DimensionsSpec
dimensionsSpec)
Review Comment:
This method always returns a singleton list, so I think it should just
return a `List<String>` rather than a list of lists.
This whole logic should be kept in `DeterminePartitionsJob` itself. We can
just have a logic like this:
```
List<String> partitionDimensions = partitionsSpec.getPartitionDimensions();
if (partitionDimensions.isEmpty()) {
// determine the partitions from the DimensionsSpec
}
```
##########
core/src/main/java/org/apache/druid/indexer/partitions/DimensionRangePartitionsSpec.java:
##########
@@ -144,6 +148,27 @@ public List<String> getPartitionDimensions()
return partitionDimensions;
}
+ public List<List<String>> getDimensionGroupingSet(DimensionsSpec
dimensionsSpec)
Review Comment:
The logic in these new methods should be kept in `DeterminePartitionsJob`
itself as they are only used for Hadoop ingestion.
##########
indexing-hadoop/src/main/java/org/apache/druid/indexer/DeterminePartitionsJob.java:
##########
@@ -626,33 +677,39 @@ protected void innerReduce(Context context, SortableBytes
keyBytes, Iterable<Dim
final DimValueCount firstDvc = iterator.next();
final long totalRows = firstDvc.numRows;
- if (!"".equals(firstDvc.dim) || !"".equals(firstDvc.value)) {
+ if (!Collections.emptyList().equals(firstDvc.dims) ||
firstDvc.values.size() != 0) {
Review Comment:
```suggestion
if (!firstDvc.dims.isEmpty() || !firstDvc.values.isEmpty()) {
```
##########
indexing-hadoop/src/main/java/org/apache/druid/indexer/DeterminePartitionsJob.java:
##########
@@ -437,13 +458,18 @@ protected void innerMap(
static class DeterminePartitionsDimSelectionMapperHelper
{
private final HadoopDruidIndexerConfig config;
- private final String partitionDimension;
+ private final List<List<String>> dimensionGroupingSet;
private final Map<Long, Integer> intervalIndexes;
+ private final boolean supportNullValue;
- DeterminePartitionsDimSelectionMapperHelper(HadoopDruidIndexerConfig
config, String partitionDimension)
+ DeterminePartitionsDimSelectionMapperHelper(HadoopDruidIndexerConfig
config)
{
this.config = config;
- this.partitionDimension = partitionDimension;
+
+ DimensionRangePartitionsSpec spec = (DimensionRangePartitionsSpec)
config.getPartitionsSpec();
+ final DimensionsSpec dimensionsSpec =
config.getSchema().getDataSchema().getDimensionsSpec();
+ this.dimensionGroupingSet = new
ArrayList<>(spec.getDimensionGroupingSet(dimensionsSpec));
+ this.supportNullValue = spec.supportNullValue();
Review Comment:
This can be determined by using the count of partition dimensions instead.
```suggestion
this.supportNullValue = partitionDimensions.size() > 1;
```
##########
indexing-hadoop/src/main/java/org/apache/druid/indexer/DeterminePartitionsJob.java:
##########
@@ -476,22 +502,29 @@ void emitDimValueCounts(
final byte[] groupKey = buf.array();
// Emit row-counter value.
- write(context, groupKey, new DimValueCount("", "", 1));
-
- for (final Map.Entry<String, Iterable<String>> dimAndValues :
dims.entrySet()) {
- final String dim = dimAndValues.getKey();
-
- if (partitionDimension == null || partitionDimension.equals(dim)) {
- final Iterable<String> dimValues = dimAndValues.getValue();
-
- if (Iterables.size(dimValues) == 1) {
- // Emit this value.
- write(context, groupKey, new DimValueCount(dim,
Iterables.getOnlyElement(dimValues), 1));
+ write(context, groupKey, new DimValueCount(Collections.emptyList(),
StringTuple.create(), 1));
+
+ Iterator<List<String>> dimensionGroupIterator =
dimensionGroupingSet.iterator();
Review Comment:
This should be simplified as the `dimensionGroupingSet` is always a
singleton list.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]