avarnon opened a new issue, #6868:
URL: https://github.com/apache/arrow-rs/issues/6868

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   My team found that `object_store`'s `AzureClient.put_block()` uses an 
incrementing counter to calculate the `content_id` and `block_id`. We are using 
`object_store` inside of a Rust based web service which means that multiple 
streams _could_ attempt to write to the same BLOB path in parallel. In our 
opinion, this could lead to a corrupt file as stream `a` could upload a block 
with the same content/block ID as stream `b`.
   
   **Describe the solution you'd like**
   My team would like to see `AzureClient.put_block()` use a randomized 
content/block ID to prevent collisions.
   
   **Describe alternatives you've considered**
   We have considered using Azure BLOB leases to prevent concurrent writes but 
determined this would be cost prohibitive.
   
   **Additional context**
   ```rust
   let part_idx = u128::from_be_bytes(rand::thread_rng().gen());
   let content_id = format!("{part_idx:40}");
   let block_id = BASE64_STANDARD.encode(&content_id);
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to