Tom-Newton commented on issue #461:
URL: 
https://github.com/apache/arrow-rs-object-store/issues/461#issuecomment-3172923383

   > I'm not sure what service the API you've linked to actually corresponds 
to, but the API we use is 
[here](https://learn.microsoft.com/en-us/rest/api/storageservices/list-blobs?view=rest-storageservices-datalakestoragegen2-2019-12-12&tabs=microsoft-entra-id)
 and does not appear to have gained the necessary support.
   > 
   > _FWIW the Azure offering in this space is extremely confused, there are 4 
different products all lumped together in a confusing mess, I don't blame you 
for your confusion. I guess having a coherent product offering isn't 
sufficiently enterprise_
   
   Yeah... I'm more familliar with it than I would like to be 😅 
   
   It looks like you use the standard blob service API 
(`<account-name>.blob.core.windows.net`). The API I linked 
(`<account-name>.dfs.core.windows.net`) is available on blob storage accounts 
with the hierarchical namespace feature enabled. Azure call this "Azure 
Datalake gen2" but I consider that just marketting - really its normal blob 
storage with an extra layer enabled, for managing directories. 
   
   I think to provide a good experience to those of us stuck with Azure, will 
require using both of these APIs. For example, this is how we implemented the 
`AzureFileSystem` for Arrow C++ 
https://github.com/apache/arrow/blob/97c9bfcdf8a9b864414fb5457a1c3f7a5747a3f1/cpp/src/arrow/filesystem/azurefs.cc#L1688-L1691
 
   
   > Edit: I would also highlight the new 
[PaginatedListStore](https://docs.rs/object_store/latest/object_store/list/trait.PaginatedListStore.html)
 which is sufficient if the desire is just for stateless pagination.
   
   Thanks for the suggestion. I got excited for a moment when I saw the 
`offset` argument but understandably its [not supported on 
Azure](https://github.com/apache/arrow-rs-object-store/blob/94c25d2dea15d2a7154bb166ae58cbf9452ebcd9/src/azure/client.rs#L978)
   
   My need is a single list operation starting from a known filename. 
Specifically this 
https://github.com/delta-io/delta-rs/blob/2920177ac5215e192e0182bed93c42c0b4a98b6f/crates/core/src/kernel/snapshot/log_segment.rs#L438.
 I believe some of the others who've shown interst in this issue have the same 
motivation, because using the default implementation of `list_with_offset` 
results in terrible performance. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to