lhoestq commented on issue #45214: URL: https://github.com/apache/arrow/issues/45214#issuecomment-4372015477
Here is another reproducible example using pyarrow 24.0.0 that was shared in https://github.com/huggingface/huggingface_hub/issues/4178: ```python from datasets import load_dataset from datasets import __version__ as ds_version from huggingface_hub import __version__ as hf_version from pyarrow import __version__ as pa_version print(f"datasets: {ds_version}") print(f"huggingface_hub: {hf_version}") print(f"pyarrow: {pa_version}") ds = load_dataset("IRIIS-RESEARCH/Nepali-Text-Corpus", split="train", streaming=True) print(next(iter(ds))) ``` which prints this and then hangs ``` datasets: 4.8.5 huggingface_hub: 1.13.0 pyarrow: 24.0.0 Resolving data files: 100%|████████████████████████████████████| 46/46 [00:00<00:00, 20887.52it/s] Resolving data files: 100%|████████████████████████████████████| 46/46 [00:00<00:00, 36191.71it/s] {'index': 5176754, 'Article': ' बिहीबार दिउँसो खोला [...] नकारी दिए ।', 'Source': 'nepalkhabar.com'} ``` `HfFileSystem` used under the hood is based on `httpx`, so this could come from a threading + httpx combination issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
