[ 
https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795751#action_12795751
 ] 

Jeff Hammerbacher commented on AVRO-219:
----------------------------------------

bq. please package this as a single patch file that replaces the existing 
implementation.

Sure, will address this on Sunday, when I return to the states.

bq. some of the TODO's seem critical, like skip_int.

skip_int and skip_long are copied from the old Python implementation. I believe 
they are broken, but this patch doesn't introduce the problem. I plan to add 
tests and sort out that issue soon, but can I address the TODOs in separate 
JIRAs? Blocking the commit of this patch for TODO scrubbing will mean more work 
outside of Apache's SVN.

bq. those big if .. elif expressions in read_data, write_data and skip_data 
look like performance pits.

The comments on that blog post point out that a bit if/(elif)+/else block is 
the standard way to approximate switch/case in Python. Simon's idiom is less 
popular in Python code I've seen. The previous implementation built a dict of 
function calls, similar to the blog post you point out, and I found that to be 
unnecessarily complex. My goal with the Python code is to be correct, concise, 
and easy to understand first, and fast second. Can we keep the current approach 
and benchmark it in AVRO-217?

bq. validate is overkill for picking the union branch.

Your suggestion sounds like a performance optimization to avoid calling 
validate() many times, but which would further obfuscate the function of the 
code. I don't think it's a good idea at this time, given the above stated aims 
of the Python implementation. If I've misunderstood your intent, please correct 
me.

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, 
> datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219-schema-io-and-datafile.patch, 
> AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are 
> grouped in with the unit tests for io.py in testio.py. We should break the 
> tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to