lhoestq commented on issue #45214:
URL: https://github.com/apache/arrow/issues/45214#issuecomment-4372015477

   Here is another reproducible example using pyarrow 24.0.0 that was shared in 
https://github.com/huggingface/huggingface_hub/issues/4178:
   
   ```python
   from datasets import load_dataset
   
   from datasets import __version__ as ds_version
   from huggingface_hub import __version__ as hf_version
   from pyarrow import __version__ as pa_version
   
   print(f"datasets: {ds_version}")
   print(f"huggingface_hub: {hf_version}")
   print(f"pyarrow: {pa_version}")
   
   ds = load_dataset("IRIIS-RESEARCH/Nepali-Text-Corpus", split="train", 
streaming=True)
   print(next(iter(ds)))
   ```
   which prints this and then hangs
   ```
   datasets: 4.8.5
   huggingface_hub: 1.13.0
   pyarrow: 24.0.0
   Resolving data files: 100%|████████████████████████████████████| 46/46 
[00:00<00:00, 20887.52it/s]
   Resolving data files: 100%|████████████████████████████████████| 46/46 
[00:00<00:00, 36191.71it/s]
   {'index': 5176754, 'Article': ' बिहीबार दिउँसो खोला [...] नकारी दिए ।', 
'Source': 'nepalkhabar.com'}
   ```
   
   `HfFileSystem` used under the hood is based on `httpx`, so this could come 
from a threading + httpx combination issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to