jinchengchenghh commented on issue #7860:
URL:
https://github.com/apache/incubator-gluten/issues/7860#issuecomment-2467101676
Velox use `SpillReadFile` to read the file, it uses `FileInputStream` to
read the file and `simd::memcpy` to copy the bytes, It will output batch
RowVector one by one. `FileInputStream` uses `velox::LocalReadFile` `pread` or
`preadv` to read the file.
As I see, it reads bufferSize_ which is controlled by QueryConfig
`kSpillReadBufferSize` (default 1MB) one time. Note: if file system supports
async read, read double bufferSize_ one time.
@FelixYBW
```
readBytes = readSize();
VELOX_CHECK_LT(
0, readBytes, "Read past end of FileInputStream {}", fileSize_);
NanosecondTimer timer_2{&readTimeNs};
file_->pread(fileOffset_, readBytes, buffer()->asMutable<char>());
uint64_t FileInputStream::readSize() const {
return std::min(fileSize_ - fileOffset_, bufferSize_);
}
```
```
/* Read data from file descriptor FD at the given position OFFSET
without change the file pointer, and put the result in the buffers
described by IOVEC, which is a vector of COUNT 'struct iovec's.
The buffers are filled in the order specified. Operates just like
'pread' (see <unistd.h>) except that data are put in IOVEC instead
of a contiguous buffer.
This function is a cancellation point and therefore not marked with
__THROW. */
extern ssize_t preadv (int __fd, const struct iovec *__iovec, int __count,
__off_t __offset) __wur;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]