[
https://issues.apache.org/jira/browse/SPARK-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Or updated SPARK-5932:
-----------------------------
Description:
This is SPARK-5931's sister issue.
The naming of existing byte configs is inconsistent. We currently have the
following throughout the code base:
{code}
spark.reducer.maxMbInFlight (mb)
spark.kryoserializer.buffer.mb (mb)
spark.shuffle.file.buffer.kb (kb)
spark.broadcast.blockSize (kb)
spark.executor.logs.rolling.size.maxBytes (bytes)
spark.io.compression.snappy.block.size (bytes)
{code}
Instead, my proposal is to simplify the config name itself and make everything
accept time using the following format: 500b, 2k, 100m, 46g, similar to what we
currently use for our memory settings. For instance:
{code}
spark.reducer.maxSizeInFlight = 10m
spark.kryoserializer.buffer = 2m
spark.shuffle.file.buffer = 10k
spark.broadcast.blockSize = 20k
spark.executor.logs.rolling.maxSize = 500b
spark.io.compression.snappy.blockSize = 200b
{code}
We should do this soon before we keep introducing more time configs.
was:
This is SPARK-5931's sister issue.
The naming of existing byte configs is inconsistent. We currently have the
following throughout the code base:
spark.reducer.maxMbInFlight (mb)
spark.kryoserializer.buffer.mb (mb)
spark.shuffle.file.buffer.kb (kb)
spark.broadcast.blockSize (kb)
spark.executor.logs.rolling.size.maxBytes (bytes)
spark.io.compression.snappy.block.size (bytes)
Instead, my proposal is to simplify the config name itself and make everything
accept time using the following format: 500b, 2k, 100m, 46g, similar to what we
currently use for our memory settings. For instance:
spark.reducer.maxSizeInFlight = 10m
spark.kryoserializer.buffer = 2m
spark.shuffle.file.buffer = 10k
spark.broadcast.blockSize = 20k
spark.executor.logs.rolling.maxSize = 500b
spark.io.compression.snappy.blockSize = 200b
We should do this soon before we keep introducing more time configs.
> Use consistent naming for byte properties
> -----------------------------------------
>
> Key: SPARK-5932
> URL: https://issues.apache.org/jira/browse/SPARK-5932
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.0.0
> Reporter: Andrew Or
> Assignee: Andrew Or
>
> This is SPARK-5931's sister issue.
> The naming of existing byte configs is inconsistent. We currently have the
> following throughout the code base:
> {code}
> spark.reducer.maxMbInFlight (mb)
> spark.kryoserializer.buffer.mb (mb)
> spark.shuffle.file.buffer.kb (kb)
> spark.broadcast.blockSize (kb)
> spark.executor.logs.rolling.size.maxBytes (bytes)
> spark.io.compression.snappy.block.size (bytes)
> {code}
> Instead, my proposal is to simplify the config name itself and make
> everything accept time using the following format: 500b, 2k, 100m, 46g,
> similar to what we currently use for our memory settings. For instance:
> {code}
> spark.reducer.maxSizeInFlight = 10m
> spark.kryoserializer.buffer = 2m
> spark.shuffle.file.buffer = 10k
> spark.broadcast.blockSize = 20k
> spark.executor.logs.rolling.maxSize = 500b
> spark.io.compression.snappy.blockSize = 200b
> {code}
> We should do this soon before we keep introducing more time configs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]