LakshSingla commented on code in PR #14629:
URL: https://github.com/apache/druid/pull/14629#discussion_r1269434531
##########
docs/multi-stage-query/reference.md:
##########
@@ -376,6 +377,8 @@ When you run a query, include the context parameter
`durableShuffleStorage` and
For queries where you want to use fault tolerance for workers, set
`faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to
`true`.
+For select queryies which want to write the final result to `durableStoage`,
set `selectDestination`:`durableStorage`. Which shuffle mesh the job uses can
still be controller by `durableShuffleStorage` flag ie. a combination where
`selectDestination`:`durableStorage` and `durableShuffleStorage`:`false` is
perfectly valid.
Review Comment:
1. spelling of queries and durableStorage
2. Something seems off in the second sentence. I think it should be
"controlled" instead of "controller".
3. `durableStoage` (first use) should be written as "durable storage"
(without double quotes) since we are using it as a noun instead of code word.
4. We should use something else instead of "shuffle mesh" in the docs since
it's not a well-documented term in the docs and could be simplified further.
Wdyt about simplifying it to something with following structure:
```
Set `selectDestination`:`durableStorage` for select queries that want to
write the final
results to durable storage instead of the task reports. Saving the results
in the durable
storage allows users to .. (benefits for the user)
The location where the workers can write the intermediate results is
independent of the
location where the final results get stored. Therefore
"durableShuffleStorage":false and
"selectDestination":"durableStorage" is a valid configuration to use in the
query context, that
instructs the controller to persist only the final result in the durable
storage, and not the
intermediate results.
```
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/destination/MSQSelectDestination.java:
##########
@@ -27,21 +29,38 @@ public enum MSQSelectDestination
/**
* Writes all the results directly to the report.
*/
- TASK_REPORT(false),
+ TASKREPORT("taskReport", false),
/**
* Writes the results as frame files to durable storage. Task report can be
truncated to a preview.
*/
- DURABLE_STORAGE(true);
+ DURABLESTORAGE("durableStorage", true);
Review Comment:
```suggestion
DURABLE_STORAGE("durableStorage", true);
```
##########
extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/sql/entity/ResultSetInformationTest.java:
##########
@@ -51,10 +51,10 @@ public class ResultSetInformationTest
new String[]{"2"},
new String[]{"3"}
),
- ImmutableList.of(new PageInformation(1L, 1L, 0))
+ ImmutableList.of(new PageInformation(0, 1L, 1L))
);
- public static final String JSON_STRING =
"{\"numTotalRows\":1,\"totalSizeInBytes\":1,\"resultFormat\":\"object\",\"dataSource\":\"ds\",\"pages\":[{\"numRows\":1,\"sizeInBytes\":1,\"id\":0}]}";
- public static final String JSON_STRING_1 =
"{\"numTotalRows\":1,\"totalSizeInBytes\":1,\"resultFormat\":\"object\",\"dataSource\":\"ds\",\"sampleRecords\":[[\"1\"],[\"2\"],[\"3\"]],\"pages\":[{\"numRows\":1,\"sizeInBytes\":1,\"id\":0}]}";
+ public static final String JSON_STRING =
"{\"numTotalRows\":1,\"totalSizeInBytes\":1,\"resultFormat\":\"object\",\"dataSource\":\"ds\",\"pages\":[{\"id\":0,\"numRows\":1,\"sizeInBytes\":1}]}";
Review Comment:
Can you add a few tests that ensure that nulls in the size or rows are not
serialized?
##########
extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/sql/SqlMSQStatementResourcePostTest.java:
##########
@@ -273,12 +273,12 @@ public void durableStorageDisabledTest()
NilStorageConnector.getInstance()
);
- String errorMessage = "The SQL Statement API cannot read from the select
destination [DURABLE_STORAGE] provided in "
- + "the query context [selectDestination] since it is
not configured. It is recommended to "
- + "configure the durable storage as it allows the
user to fetch large result sets. "
+ String errorMessage = "The sql statement api cannot read from the select
destination [durableStorage] provided in "
+ + "the query context [selectDestination] since it is
not configured on the [broker]. It is recommended to "
Review Comment:
Using `[broker]` doesn't seem right. We should either write it as:
```
the query context [selectDestination] since it is not configured on the node
[broker].
```
or remove the interpolation
```
the query context [selectDestination] since it is not configured on the
broker.
```
See point (2) here
https://github.com/apache/druid/blob/1ddbaa874429d6f176c90f25189c959d5d1eae55/dev/style-conventions.md?plain=1#L48
##########
processing/src/main/java/org/apache/druid/query/QueryContexts.java:
##########
@@ -454,7 +456,12 @@ public static <E extends Enum<E>> E getAsEnum(String key,
Object value, Class<E>
catch (IllegalArgumentException e) {
throw badValueException(
key,
- StringUtils.format("a value of enum [%s]", clazz.getSimpleName()),
+ StringUtils.format(
+ "referring to one of the values[%s] of enum [%s]",
Review Comment:
There's inconsistent spacing between `values[%s]` and `enum [%s]`
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/resources/SqlStatementResource.java:
##########
@@ -848,24 +849,25 @@ private void contextChecks(QueryContext queryContext)
}
MSQSelectDestination selectDestination =
MultiStageQueryContext.getSelectDestination(queryContext);
- if (selectDestination == MSQSelectDestination.DURABLE_STORAGE) {
- checkForDurableStorageConnectorImpl();
+ if (MSQSelectDestination.DURABLESTORAGE.equals(selectDestination)) {
+ checkForDurableStorageConnectorImpl(NodeRole.BROKER);
}
}
- private void checkForDurableStorageConnectorImpl()
+ private void checkForDurableStorageConnectorImpl(NodeRole nodeRole)
Review Comment:
This parameter is always NodeRole.BROKER. Why is this change required?
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/destination/MSQSelectDestination.java:
##########
@@ -27,21 +29,38 @@ public enum MSQSelectDestination
/**
* Writes all the results directly to the report.
*/
- TASK_REPORT(false),
+ TASKREPORT("taskReport", false),
Review Comment:
We should use a snake case for the static variable names. Please revert it
to the original values.
Since this is changed for equality, we can create a static method that
checks if the String provided equals the enum destination.
```suggestion
TASK_REPORT("taskReport", false),
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]