Yohahaha commented on code in PR #5952:
URL: https://github.com/apache/incubator-gluten/pull/5952#discussion_r1623413974
##########
cpp/core/operators/c2r/ColumnarToRow.h:
##########
@@ -27,7 +28,15 @@ class ColumnarToRowConverter {
virtual ~ColumnarToRowConverter() = default;
- virtual void convert(std::shared_ptr<ColumnarBatch> cb = nullptr) = 0;
+ // We will start conversion from the 'rowId' row of 'cb'. The maximum memory
consumption during the grabbing and
+ // swapping process is 'memoryThreshold' bytes. The number of rows
successfully converted is stored in the 'numRows_'
+ // variable.
+ virtual void
+ convert(std::shared_ptr<ColumnarBatch> cb = nullptr, int64_t rowId = 0,
int64_t memoryThreshold = INT64_MAX) = 0;
Review Comment:
could we configure `memoryThreshold` for ColumnarToRowConverter when
initializing it?
##########
cpp/core/jni/JniWrapper.cc:
##########
@@ -580,17 +583,28 @@
Java_org_apache_gluten_vectorized_NativeColumnarToRowJniWrapper_nativeColumnarTo
JNIEnv* env,
jobject wrapper,
jlong batchHandle,
- jlong c2rHandle) {
+ jlong c2rHandle,
+ jlong rowId) {
JNI_METHOD_START
auto ctx = gluten::getRuntime(env, wrapper);
+ auto& conf = ctx->getConfMap();
+
auto columnarToRowConverter =
ctx->objectStore()->retrieve<ColumnarToRowConverter>(c2rHandle);
auto cb = ctx->objectStore()->retrieve<ColumnarBatch>(batchHandle);
- columnarToRowConverter->convert(cb);
+
+ int64_t column2RowMemThreshold = 256 * 1024 * 1024;
Review Comment:
move default value to GlutenConfig.h
##########
shims/common/src/main/scala/org/apache/gluten/GlutenConfig.scala:
##########
@@ -1031,6 +1038,12 @@ object GlutenConfig {
.checkValue(_ > 0, s"$GLUTEN_MAX_BATCH_SIZE_KEY must be positive.")
.createWithDefault(4096)
+ val GLUTEN_COLUMNAR_TO_ROW_MEM_THRESHOLD =
+ buildConf(GLUTEN_COLUMNAR_TO_ROW_MEM_THRESHOLD_KEY)
+ .internal()
+ .longConf
+ .createWithDefault(256 * 1024 * 1024)
+
Review Comment:
we have support the usage of `set
spark.gluten.sql.columnarToRowMemoryThreshold=256m`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]