vinson0526 commented on a change in pull request #3186: Support convert  Arrow 
data to RowBatch asynchronously in Spark-Doris-Connector
URL: https://github.com/apache/incubator-doris/pull/3186#discussion_r397590368
 
 

 ##########
 File path: 
extension/spark-doris-connector/src/main/java/org/apache/doris/spark/serialization/RowBatch.java
 ##########
 @@ -87,50 +88,39 @@ public RowBatch(TScanBatchResult nextResult, Schema 
schema) throws DorisExceptio
                 new ByteArrayInputStream(nextResult.getRows()),
                 rootAllocator
                 );
+        this.offsetInRowBatch = 0;
         try {
             this.root = arrowStreamReader.getVectorSchemaRoot();
+            while (arrowStreamReader.loadNextBatch()) {
+                fieldVectors = root.getFieldVectors();
+                if (fieldVectors.size() != schema.size()) {
+                    logger.error("Schema size '{}' is not equal to arrow field 
size '{}'.",
+                            fieldVectors.size(), schema.size());
+                    throw new DorisException("Load Doris data failed, schema 
size of fetch data is wrong.");
+                }
+                if (fieldVectors.size() == 0 || root.getRowCount() == 0) {
+                    logger.debug("One batch in arrow has no data.");
+                    continue;
+                }
+                rowCountInOneBatch = root.getRowCount();
+                // init the rowBatch
+                for (int i = 0; i < rowCountInOneBatch; ++i) {
+                    rowBatch.add(new Row(fieldVectors.size()));
+                }
+                convertArrowToRowBatch();
+                readRowCount += root.getRowCount();
 
 Review comment:
   function close could be removed and move its content to finally, since read 
all data from arrow here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to