amunra opened a new issue, #6995:
URL: https://github.com/apache/opendal/issues/6995

   ### Describe the bug
   
   We need an async reader from an opendal reader.
   We call the `reader.into_futures_async_read` method passing in a full range 
(`..`) as recommended in the API docs.
   
   We often see that we get a _partial_ read back, not the latest full contents.
   
   It looks like the issue comes from here:
   
   ```rust
       #[inline]
       pub async fn into_futures_async_read(
           self,
           range: impl RangeBounds<u64>,
       ) -> Result<FuturesAsyncReader> {
           let range = self.ctx.parse_into_range(range).await?;
           Ok(FuturesAsyncReader::new(self.ctx, range))
       }
   ```
   
   ```rust
       pub(crate) async fn parse_into_range(
           &self,
           range: impl RangeBounds<u64>,
       ) -> Result<Range<u64>> {
           let start = match range.start_bound() {
               Bound::Included(v) => *v,
               Bound::Excluded(v) => v + 1,
               Bound::Unbounded => 0,
           };
   
           let end = match range.end_bound() {
               Bound::Included(v) => v + 1,
               Bound::Excluded(v) => *v,
               Bound::Unbounded => {         // <<<<<<<<<<<<<<<<< BAD!
                   let mut op_stat = OpStat::new();
   
                   if let Some(v) = self.args().version() {
                       op_stat = op_stat.with_version(v);
                   }
   
                   self.accessor()
                       .stat(self.path(), op_stat)
                       .await?
                       .into_metadata()
                       .content_length()
               }
           };
   
           Ok(start..end)
       }
   ```
   
   This has multiple issues:
   * It duplicates the number of request to the object store, causing latency 
and cost implications.
   * The logic has a race condition: The length may be wrong by the time the 
read is initiated.
   
   The data in our object store is often overwritten on the same path quickly.
   We've noticed a number of cases when the content ends up being corrupt 
because it's too short.
   
   
   
   ### Steps to Reproduce
   
   One writer overwrites a path in a loop quickly.
   A reader reads the path in a quick loop, using the futures reader.
   
   ### Expected Behavior
   
   An unbounded end range should be handled as a separate special case. It 
should not be converted to a bounded range as this is fundamentally a bug.
   
   The `..` range should always return the full latest data without accidental 
truncation and should do so without additional requests to the object store.
   
   ### Additional Context
   
   _No response_
   
   ### Are you willing to submit a PR to fix this bug?
   
   - [ ] Yes, I would like to submit a PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to