CodiumAI-Agent commented on PR #8586: URL: https://github.com/apache/incubator-gluten/pull/8586#issuecomment-2636199280
## PR Reviewer Guide ๐ Here are some key observations to aid the review process: <table> <tr><td> **๐ซ Ticket compliance analysis ๐ถ** **[8479](https://github.com/apache/incubator-gluten/issues/8479) - Partially compliant** Compliant requirements: - Split backend configurations into individual modules that belong to each other. - Move Velox-related configurations to VeloxConfig and CH-related configurations to CHConf. - Delete unused configuration retrieval methods. Non-compliant requirements: - Requires further human verification: - Ensure that the configurations are correctly split and assigned to their respective modules. - Validate that the deleted configuration retrieval methods are indeed unused. </td></tr> <tr><td>โฑ๏ธ <strong>Estimated effort to review</strong>: 4 ๐ต๐ต๐ต๐ตโช</td></tr> <tr><td>๐งช <strong>PR contains tests</strong></td></tr> <tr><td>๐ <strong>No security concerns identified</strong></td></tr> <tr><td>โก <strong>Recommended focus areas for review</strong><br><br> <details><summary><a href='https://github.com/apache/incubator-gluten/pull/8586/files#diff-30fb328f9a10f0e7640aee2f5339ed33856ac67aa643cd861c26156f6b5e8292R37-R49'><strong>Possible Issue</strong></a> Ensure that the `ResizeRange` assertions for `min` and `max` values are robust and handle edge cases properly.</summary> ```scala case class ResizeRange(min: Int, max: Int) { assert(max >= min) assert(min > 0, "Min batch size should be larger than 0") assert(max > 0, "Max batch size should be larger than 0") } def veloxResizeBatchesShuffleInputRange: ResizeRange = { val standardSize = getConf(COLUMNAR_MAX_BATCH_SIZE) val defaultMinSize: Int = (0.25 * standardSize).toInt.max(1) val minSize = getConf(COLUMNAR_VELOX_RESIZE_BATCHES_SHUFFLE_INPUT_MIN_SIZE) .getOrElse(defaultMinSize) ResizeRange(minSize, Int.MaxValue) } ``` </details> <details><summary><a href='https://github.com/apache/incubator-gluten/pull/8586/files#diff-ba578706f0bf62b89b59e8057e2c6015ad7aee24e60b49522a9155095ff5044fR55-R74'><strong>Configuration Dependency</strong></a> Validate the dependency between `COLUMNAR_VELOX_CACHE_ENABLED` and `COLUMNAR_VELOX_FILE_HANDLE_CACHE_ENABLED` to ensure no runtime issues occur.</summary> ```scala // When the Velox cache is enabled, the Velox file handle cache should also be enabled. // Otherwise, a 'reference id not found' error may occur. if ( conf.getBoolean(COLUMNAR_VELOX_CACHE_ENABLED.key, false) && !conf.getBoolean(COLUMNAR_VELOX_FILE_HANDLE_CACHE_ENABLED.key, false) ) { throw new IllegalArgumentException( s"${COLUMNAR_VELOX_CACHE_ENABLED.key} and " + s"${COLUMNAR_VELOX_FILE_HANDLE_CACHE_ENABLED.key} should be enabled together.") } if ( conf.getBoolean(COLUMNAR_VELOX_CACHE_ENABLED.key, false) && conf.getSizeAsBytes(LOAD_QUANTUM.key, LOAD_QUANTUM.defaultValueString) > 8 * 1024 * 1024 ) { throw new IllegalArgumentException( s"Velox currently only support up to 8MB load quantum size " + s"on SSD cache enabled by ${COLUMNAR_VELOX_CACHE_ENABLED.key}, " + s"User can set ${LOAD_QUANTUM.key} <= 8MB skip this error.") } ``` </details> <details><summary><a href='https://github.com/apache/incubator-gluten/pull/8586/files#diff-72571ea9001f2db607c8bb188804ba74a2427cd8aa2e52a24c54d2e53616d858R70-R82'><strong>Configuration Default Values</strong></a> Verify that the default values for `COLUMNAR_CH_SHUFFLE_SPILL_THRESHOLD` and `COLUMNAR_CH_MAX_SORT_BUFFER_SIZE` are appropriate for typical use cases.</summary> ```scala val COLUMNAR_CH_SHUFFLE_SPILL_THRESHOLD = buildConf("spark.gluten.sql.columnar.backend.ch.spillThreshold") .internal() .doc("Shuffle spill threshold on ch backend") .bytesConf(ByteUnit.BYTE) .createWithDefaultString("0MB") val COLUMNAR_CH_MAX_SORT_BUFFER_SIZE = buildConf("spark.gluten.sql.columnar.backend.ch.maxSortBufferSize") .internal() .doc("The maximum size of sort shuffle buffer in CH backend.") .bytesConf(ByteUnit.BYTE) .createWithDefaultString("0") ``` </details> </td></tr> </table> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
