[ 
https://issues.apache.org/jira/browse/SPARK-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5932:
-----------------------------
    Description: 
This is SPARK-5931's sister issue.

The naming of existing byte configs is inconsistent. We currently have the 
following throughout the code base:
{code}
spark.reducer.maxMbInFlight // megabytes
spark.kryoserializer.buffer.mb // megabytes
spark.shuffle.file.buffer.kb // kilobytes
spark.broadcast.blockSize // kilobytes
spark.executor.logs.rolling.size.maxBytes // bytes
spark.io.compression.snappy.block.size // bytes
{code}
Instead, my proposal is to simplify the config name itself and make everything 
accept time using the following format: 500b, 2k, 100m, 46g, similar to what we 
currently use for our memory settings. For instance:
{code}
spark.reducer.maxSizeInFlight = 10m
spark.kryoserializer.buffer = 2m
spark.shuffle.file.buffer = 10k
spark.broadcast.blockSize = 20k
spark.executor.logs.rolling.maxSize = 500b
spark.io.compression.snappy.blockSize = 200b
{code}
All existing configs that are relevant will be deprecated in favor of the new 
ones. We should do this soon before we keep introducing more time configs.

  was:
This is SPARK-5931's sister issue.

The naming of existing byte configs is inconsistent. We currently have the 
following throughout the code base:
{code}
spark.reducer.maxMbInFlight (mb)
spark.kryoserializer.buffer.mb (mb)
spark.shuffle.file.buffer.kb (kb)
spark.broadcast.blockSize (kb)
spark.executor.logs.rolling.size.maxBytes (bytes)
spark.io.compression.snappy.block.size (bytes)
{code}
Instead, my proposal is to simplify the config name itself and make everything 
accept time using the following format: 500b, 2k, 100m, 46g, similar to what we 
currently use for our memory settings. For instance:
{code}
spark.reducer.maxSizeInFlight = 10m
spark.kryoserializer.buffer = 2m
spark.shuffle.file.buffer = 10k
spark.broadcast.blockSize = 20k
spark.executor.logs.rolling.maxSize = 500b
spark.io.compression.snappy.blockSize = 200b
{code}
All existing configs that are relevant will be deprecated in favor of the new 
ones. We should do this soon before we keep introducing more time configs.


> Use consistent naming for byte properties
> -----------------------------------------
>
>                 Key: SPARK-5932
>                 URL: https://issues.apache.org/jira/browse/SPARK-5932
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Andrew Or
>            Assignee: Andrew Or
>
> This is SPARK-5931's sister issue.
> The naming of existing byte configs is inconsistent. We currently have the 
> following throughout the code base:
> {code}
> spark.reducer.maxMbInFlight // megabytes
> spark.kryoserializer.buffer.mb // megabytes
> spark.shuffle.file.buffer.kb // kilobytes
> spark.broadcast.blockSize // kilobytes
> spark.executor.logs.rolling.size.maxBytes // bytes
> spark.io.compression.snappy.block.size // bytes
> {code}
> Instead, my proposal is to simplify the config name itself and make 
> everything accept time using the following format: 500b, 2k, 100m, 46g, 
> similar to what we currently use for our memory settings. For instance:
> {code}
> spark.reducer.maxSizeInFlight = 10m
> spark.kryoserializer.buffer = 2m
> spark.shuffle.file.buffer = 10k
> spark.broadcast.blockSize = 20k
> spark.executor.logs.rolling.maxSize = 500b
> spark.io.compression.snappy.blockSize = 200b
> {code}
> All existing configs that are relevant will be deprecated in favor of the new 
> ones. We should do this soon before we keep introducing more time configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to