[
https://issues.apache.org/jira/browse/BEAM-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pavlo Zhukov updated BEAM-7411:
-------------------------------
Description:
To reduce the size of uploaded files we decided to gzip it before upload.
Unfortunately, we noticed that we don't have content-encoding 'gzip' in the
uploaded files metadata. I rechecked the code and noticed that there is no way
to pass gzip encoding on
{code:java}
apache_beam.io.gcp.gcsio.GcsIO.open(){code}
Also, I noticed that apache_beam.io.gcp.gcsio.GcsUploader doesn't support
uploading for gzipped files.
To resolve this problem we need to allow pass gzip_encoded option, which can be
passed to apitools.base.py.transfer on
{code:java}
GcsUploader.__init__()
{code}
Is there any possibility that you apply the required changes soon?
*What steps to reproduce the problem?*
1. Prepare gzip encoded file for example pdf
2. Upload it to GCS using
{code:python}
from apache_beam.io.gcp.gcsio import GcsIO
def upload_gzipped_pdf(gzipped_pdf, path)
with GcsIO().open(path, 'w') as f:
f.write(gzipped_pdf)
{code}
3. Try to download uploaded file via browser
*What is the expected result?*
I see the file content properly
*What happens instead?*
I have a broken document
*Possible resolution after implementing expected changes*
{code:python}
from apache_beam.io.gcp.gcsio import GcsIO
def upload_gzipped_pdf(gzipped_pdf, path)
with GcsIO().open(path, 'w', gzip_encoded=True) as f:
f.write(gzipped_pdf)
{code}
was:
To reduce the size of uploaded files we decided to gzip it before upload.
Unfortunately, we noticed that we don't have content-encoding 'gzip' in the
uploaded files metadata. I rechecked the code and noticed that there is no way
to pass gzip encoding on
{code:java}
apache_beam.io.gcp.gcsio.GcsIO.open(){code}
Also, I noticed that apache_beam.io.gcp.gcsio.GcsUploader doesn't support
uploading for gzipped files.
To resolve this problem we need to allow pass gzip_encoded option, which can be
passed to apitools.base.py.transfer on
{code:java}
GcsUploader.__init__()
{code}
Is there any possibility that you apply the required changes soon?
*What steps to reproduce the problem?*
1. Prepare gzip encoded file for example pdf
2. Upload it to GCS using
{code:python}
from apache_beam.io.gcp.gcsio import GcsIO
def upload_gzipped_pdf(gzipped_pdf, path)
with GcsIO().open(path, 'w') as f:
f.write(gzipped_pdf)
{code}
3. Try to download uploaded file via browser
*What is the expected result?*
I see the file content properly
*What happens instead?*
I have a broken document
*Possible resolution after implementing expected changes*
from apache_beam.io.gcp.gcsio import GcsIO
{code:python}
def upload_gzipped_pdf(gzipped_pdf, path)
with GcsIO().open(path, 'w', gzip_encoded=True) as f:
f.write(gzipped_pdf)
{code}
> Allow upload gzipped files via apache_beam.io.gcp.gcsio.GcsIO with proper
> content-encoding
> ------------------------------------------------------------------------------------------
>
> Key: BEAM-7411
> URL: https://issues.apache.org/jira/browse/BEAM-7411
> Project: Beam
> Issue Type: Improvement
> Components: io-python-gcp
> Reporter: Pavlo Zhukov
> Priority: Major
>
> To reduce the size of uploaded files we decided to gzip it before upload.
> Unfortunately, we noticed that we don't have content-encoding 'gzip' in the
> uploaded files metadata. I rechecked the code and noticed that there is no
> way to pass gzip encoding on
> {code:java}
> apache_beam.io.gcp.gcsio.GcsIO.open(){code}
> Also, I noticed that apache_beam.io.gcp.gcsio.GcsUploader doesn't support
> uploading for gzipped files.
> To resolve this problem we need to allow pass gzip_encoded option, which can
> be passed to apitools.base.py.transfer on
> {code:java}
> GcsUploader.__init__()
> {code}
> Is there any possibility that you apply the required changes soon?
> *What steps to reproduce the problem?*
> 1. Prepare gzip encoded file for example pdf
> 2. Upload it to GCS using
> {code:python}
> from apache_beam.io.gcp.gcsio import GcsIO
> def upload_gzipped_pdf(gzipped_pdf, path)
> with GcsIO().open(path, 'w') as f:
> f.write(gzipped_pdf)
> {code}
> 3. Try to download uploaded file via browser
> *What is the expected result?*
> I see the file content properly
> *What happens instead?*
> I have a broken document
>
> *Possible resolution after implementing expected changes*
> {code:python}
> from apache_beam.io.gcp.gcsio import GcsIO
> def upload_gzipped_pdf(gzipped_pdf, path)
> with GcsIO().open(path, 'w', gzip_encoded=True) as f:
> f.write(gzipped_pdf)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)