[ 
https://issues.apache.org/jira/browse/AVRO-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638198#comment-13638198
 ] 

Jeremy Kahn commented on AVRO-1304:
-----------------------------------

Uri, what strategy are you using to try to fix this? Could we memoize the 
partner schema to short-circuit out of match_schemas (trading a small amount of 
memory for speed)?

I'm eager to improve the speed of the Python library, and a 20% speedup could 
shave days off my team's product delivery.

Contact me offline ([email protected]) if you'd like to share your profiling 
setup (I can try to implement related speedups).  
                
> Python Avro match_schemas called redundantly
> --------------------------------------------
>
>                 Key: AVRO-1304
>                 URL: https://issues.apache.org/jira/browse/AVRO-1304
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.7.4
>            Reporter: Uri Laserson
>
> DatumReader.match_schemas(writers_schema, readers_schema) is called on every 
> single read from the DatumReader.  However, for almost every read, the 
> schemas used are the object members self.writers_schema and 
> self.readers_schema.  match_schemas should be checked only once in this case, 
> and only when the object members are modified.  This takes up 20% of my parse 
> time upon profiling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to