eric-wang-1990 commented on code in PR #2669: URL: https://github.com/apache/arrow-adbc/pull/2669#discussion_r2027735213
########## csharp/src/Drivers/Apache/Spark/SparkDatabricksReader.cs: ########## @@ -79,6 +91,49 @@ public SparkDatabricksReader(HiveServer2Statement statement, Schema schema) } } + private async Task ProcessFetchedBatchesAsync(CancellationToken cancellationToken) + { + var batch = this.batches![this.index]; + + // Ensure batch data exists + if (batch.Batch == null || batch.Batch.Length == 0) + { + this.index++; + return; + } + + try + { + byte[] dataToUse = batch.Batch; + + // If LZ4 compression is enabled, try to decompress the data + if (isLz4Compressed) + { + try + { + var dataStream = await Lz4Utilities.DecompressLz4Async(batch.Batch, cancellationToken); + dataToUse = dataStream.ToArray(); + dataStream.Dispose(); + } + catch (Exception ex) + { + // If decompression fails, use the original data + System.Diagnostics.Debug.WriteLine($"Failed to decompress LZ4 data: {ex.Message}"); + } + } + + // Always use ChunkStream which ensures proper schema handling + this.reader = new ArrowStreamReader(new ChunkStream(this.schema, dataToUse)); + } + catch (Exception ex) + { + // Log any errors and skip this batch + System.Diagnostics.Debug.WriteLine($"Error processing batch: {ex.Message}"); Review Comment: Yeah I was planning to throw, somehow this part got ommited. We do not want partial data and should definitely throw here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org