[ https://issues.apache.org/jira/browse/BEAM-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pablo Estrada resolved BEAM-2810. --------------------------------- Fix Version/s: Not applicable Resolution: Fixed This has been fixed for a while, I believe. > Consider a faster Avro library in Python > ---------------------------------------- > > Key: BEAM-2810 > URL: https://issues.apache.org/jira/browse/BEAM-2810 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Reporter: Eugene Kirpichov > Assignee: Ryan Williams > Priority: Major > Fix For: Not applicable > > Time Spent: 6h 50m > Remaining Estimate: 0h > > https://stackoverflow.com/questions/45870789/bottleneck-on-data-source > Seems like this job is reading Avro files (exported by BigQuery) at about 2 > MB/s. > We use the standard Python "avro" library which is apparently known to be > very slow (10x+ slower than Java) > http://apache-avro.679487.n3.nabble.com/Avro-decode-very-slow-in-Python-td4034422.html, > and there are alternatives e.g. https://pypi.python.org/pypi/fastavro/ -- This message was sent by Atlassian Jira (v8.3.4#803005)