[
https://issues.apache.org/jira/browse/ARROW-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931552#comment-16931552
]
Martin Durant commented on ARROW-5072:
--------------------------------------
Ideally, you should write within a context like
{code:java}
with s3.open(...) as f:
write_with(f){code}
Actually, the first call of the S3 API happens whenever the first data is sent,
which will be when the buffer first fills beyond the configured block size -
this is an important optimisation. You are welcome to propose that the method
in question, `._initiate_upload()` be called during file instance creation.
That could be within s3fs, or fsspec (in which case it will propagate to all
file implementations which derive from AbstractBufferedFile).
Note that once you call `flush(force=True)`, you cannot write any more to the
file, this is equivalent to closing it.
> [Python] write_table fails silently on S3 errors
> ------------------------------------------------
>
> Key: ARROW-5072
> URL: https://issues.apache.org/jira/browse/ARROW-5072
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.12.1
> Environment: Python 3.6.8
> Reporter: Paul George
> Priority: Minor
> Labels: filesystem, parquet
> Fix For: 0.15.0
>
>
> {{pyarrow==0.12.1}}
> *pyarrow.parquet.write_table* called with where=S3File(...) fails silently
> when encountering errors while writing to S3 (in the example below, boto3 is
> raising a NoSuchBucket exception). However, instead of using S3File(),
> calling write_table with where=_<filepath>_ and with
> filesystem=S3FileSystem() does *not* fail silently and raises, as is expected.
> h4. Code/Repro
>
> {code:java}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> from s3fs import S3File, S3FileSystem
> df = pd.DataFrame({'col0': []})
> s3_filepath = 's3://some-bogus-bucket/df.parquet'
> print('>> test 1')
> try:
> # use S3File --> fails silently
> pq.write_table(pa.Table.from_pandas(df.copy()),
> S3File(S3FileSystem(), s3_filepath, mode='wb'))
> except Exception:
> print('>>>> Exception raised!')
> else:
> print('>>>> Exception **NOT** raised!')
> print('>> test 2')
> try:
> # use filepath and S3FileSystem --> raises Exception, as expected
> pq.write_table(pa.Table.from_pandas(df.copy()),
> s3_filepath,
> filesystem=S3FileSystem())
> except Exception:
> print('>>>> Exception raised!')
> else:
> print('>>>> Exception **NOT** raised!'){code}
>
> h4.
> h4. Output
> {code:java}
> >> test 1
> Exception ignored in: <bound method S3File.__del__ of <S3File
> some-bogus-bucket/df.parquet>>
> Traceback (most recent call last):
> File "<redacted>/lib/python3.6/site-packages/s3fs/core.py", line 1476, in
> __del__
> self.close()
> File "<redacted>/lib/python3.6/site-packages/s3fs/core.py", line 1454, in
> close
> raise_from(IOError('Write failed: %s' % self.path), e)
> File "<string>", line 3, in raise_from
> OSError: Write failed: some-bogus-bucket/df.parquet
> >>>> Exception **NOT** raised!
> >> test 2
> >>>> Exception raised!
> Exception ignored in: <bound method S3File.__del__ of <S3File
> some-bogus-bucket/df.parquet>>
> Traceback (most recent call last):
> File "<redacted>/lib/python3.6/site-packages/s3fs/core.py", line 1476, in
> __del__
> self.close()
> File "<redacted>/lib/python3.6/site-packages/s3fs/core.py", line 1454, in
> close
> raise_from(IOError('Write failed: %s' % self.path), e)
> File "<string>", line 3, in raise_from
> OSError: Write failed: some-bogus-bucket/df.parquet
> {code}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)