clintropolis commented on a change in pull request #10243:
URL: https://github.com/apache/druid/pull/10243#discussion_r473797085
##########
File path:
core/src/main/java/org/apache/druid/data/input/MaxSizeSplitHintSpec.java
##########
@@ -43,22 +45,55 @@
public static final String TYPE = "maxSize";
@VisibleForTesting
- static final long DEFAULT_MAX_SPLIT_SIZE = 512 * 1024 * 1024;
+ static final HumanReadableBytes DEFAULT_MAX_SPLIT_SIZE = new
HumanReadableBytes("1GiB");
- private final long maxSplitSize;
+ /**
+ * There are two known issues when a split contains a large list of files.
+ *
+ * - 'jute.maxbuffer' in ZooKeeper. This system property controls the max
size of ZNode. As its default is 500KB,
+ * task allocation can fail if the serialized ingestion spec is larger
than this limit.
+ * - 'max_allowed_packet' in MySQL. This is the max size of a communication
packet sent to a MySQL server.
+ * The default is either 64MB or 4MB depending on MySQL version. Updating
metadata store can fail if the serialized
+ * ingestion spec is larger than this limit.
+ *
+ * The default is conservatively chosen as 1000.
+ */
+ @VisibleForTesting
+ static final int DEFAULT_MAX_NUM_FILES = 1000;
+
+ private final HumanReadableBytes maxSplitSize;
+ private final int maxNumFiles;
@JsonCreator
- public MaxSizeSplitHintSpec(@JsonProperty("maxSplitSize") @Nullable Long
maxSplitSize)
+ public MaxSizeSplitHintSpec(
+ @JsonProperty("maxSplitSize") @Nullable HumanReadableBytes maxSplitSize,
+ @JsonProperty("maxNumFiles") @Nullable Integer maxNumFiles
+ )
{
this.maxSplitSize = maxSplitSize == null ? DEFAULT_MAX_SPLIT_SIZE :
maxSplitSize;
+ this.maxNumFiles = maxNumFiles == null ? DEFAULT_MAX_NUM_FILES :
maxNumFiles;
+ Preconditions.checkArgument(this.maxSplitSize.getBytes() > 0,
"maxSplitSize should be larger than 0");
+ Preconditions.checkArgument(this.maxNumFiles > 0, "maxNumFiles should be
larger than 0");
+ }
+
+ @VisibleForTesting
+ public MaxSizeSplitHintSpec(long maxSplitSize, @Nullable Integer maxNumFiles)
+ {
+ this(new HumanReadableBytes(maxSplitSize), maxNumFiles);
}
@JsonProperty
- public long getMaxSplitSize()
+ public HumanReadableBytes getMaxSplitSize()
Review comment:
Should `SegmentsSplitHintSpec` be updated to accept `HumanReadableBytes`
for `maxInputSegmentBytesPerTask` as well?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]