Vinuthna created AVRO-2203:
------------------------------

             Summary: avro module in python generates different bytes while 
writing file to local storage and s3 
                 Key: AVRO-2203
                 URL: https://issues.apache.org/jira/browse/AVRO-2203
             Project: Avro
          Issue Type: Bug
          Components: python
    Affects Versions: 1.8.0
         Environment: S3. UNIX, HDFS, python
            Reporter: Vinuthna


Hi, 

I am trying to convert a csv file to avro format and store it on S3 storage 
using python. During this process, I see that there is data loss in the file 
written to s3 storage. This is confirmed by converting the avro file on local 
storage and avro file on s3 storage to json format by comparing the content and 
total number of lines present in each file. 

A deep investigation into this issue shows that avro data generated while 
writing to local storage is not exactly same as the avro data generated while 
writing to s3 storage. 

 I suspect issue is in getting a writer object using DatumWriter. 

writer = avro.datafile.DataFileWriter(<fileobject>, avro.io.DatumWriter(), 
schema)

Exact code is present in git hub link below- 

https://github.com/mpenkov/smart_open/blob/209/integration-tests/test_209.py

Could you please help solve this issue?

 

Thanks

Vinuthna

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to