LakshSingla commented on code in PR #13981:
URL: https://github.com/apache/druid/pull/13981#discussion_r1151335632
##########
processing/src/main/java/org/apache/druid/frame/channel/ReadableInputStreamFrameChannel.java:
##########
@@ -40,6 +40,7 @@
*/
public class ReadableInputStreamFrameChannel implements ReadableFrameChannel
{
+ private boolean toStart = true;
Review Comment:
```suggestion
private boolean started = true;
```
`toStart` seems unclear for a boolean, something like `started` or
`hasStarted` seems appropriate
##########
extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/output/S3StorageConnector.java:
##########
@@ -142,6 +144,12 @@ public boolean hasMoreElements()
@Override
public InputStream nextElement()
{
+ // since Sequence input stream calls nextElement in the constructor,
we start chunking as soon as we call read.
+ // to avoid that we pass a nullInputStream for the first iteration.
+ if (!initStream) {
Review Comment:
I was thinking if there can be a slightly better way to achieve this,
instead of this one off handling. The issue is arising because `nextElement()`
prefetches the InputStream which should return the data, however the
S3StorageConnector is also chunking the file simultaneously.
If we create a new InputStream that acts like a wrapper over this
initializing code ( which copies to the file, that is present in the
`nextElement`) we won't require this `initStream` flag and special handling.
```
InputStream x = new InputStream()
{
@Override
public int read() throws IOException
{
File f = chunkFromS3(range); // Implement this
FileInputStream fis = new FileInputStream(f);
this.fis = fis;
return fis.read();
}
};
```
One advantage of the above approach would be that we wouldn't be prefetching
a chunk. (When read() is called in the SequenceInputStream, it calls the
`nextStream()` to prepare the next element in the iteration. Since the code
change handles only the initialization, we would be prefetching before we call
read() on the next element. There might not be a huge benefit to this though if
we are reading the S3 object continuously).
##########
processing/src/main/java/org/apache/druid/frame/channel/ReadableInputStreamFrameChannel.java:
##########
@@ -150,6 +152,10 @@ public void close()
private void startReading()
{
+ if (!toStart) {
Review Comment:
Is this change required, once the cause is fixed in the
`S3StorageConnector`? (If so, then can you document the behavior that we buffer
the input stream lazily)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]