[ 
https://issues.apache.org/jira/browse/AVRO-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AVRO-2203.
------------------------------------
    Resolution: Cannot Reproduce

Can't open the link provided. Please resubmit the issue when it still persists 
with Avro 1.8.2

> avro module in python generates different bytes while writing file to local 
> storage and s3 
> -------------------------------------------------------------------------------------------
>
>                 Key: AVRO-2203
>                 URL: https://issues.apache.org/jira/browse/AVRO-2203
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.8.0
>         Environment: S3. UNIX, HDFS, python
>            Reporter: Vinuthna
>            Priority: Blocker
>
> Hi, 
> I am trying to convert a csv file to avro format and store it on S3 storage 
> using python. During this process, I see that there is data loss in the file 
> written to s3 storage. This is confirmed by converting the avro file on local 
> storage and avro file on s3 storage to json format by comparing the content 
> and total number of lines present in each file. 
> A deep investigation into this issue shows that avro data generated while 
> writing to local storage is not exactly same as the avro data generated while 
> writing to s3 storage. 
>  I suspect issue is in getting a writer object using DatumWriter. 
> writer = avro.datafile.DataFileWriter(<fileobject>, avro.io.DatumWriter(), 
> schema)
> Exact code is present in git hub link below- 
> https://github.com/mpenkov/smart_open/blob/209/integration-tests/test_209.py
> Could you please help solve this issue?
>  
> Thanks
> Vinuthna
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to