[
https://issues.apache.org/jira/browse/SPARK-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597973#comment-14597973
]
Joseph Batchik commented on SPARK-746:
--------------------------------------
Spark can currently serialize the three type of Avro records if the user
specifies Kryo. Specific and Reflect records serialize just fine, if the user
registers them ahead of time, since Kryo can efficiently deal with serializing
classes. The problem lies in generic records since Kryo cannot serialize them
without a large amount of overhead. This causes issues for users who want to
use Avro records during a shuffle. To alleviate this, I implemented a custom
Kryo serializer for generic records that tries to reduce the amount of network
IO.
https://github.com/JDrit/spark/commit/6f1106bc20eb670e963d45a191dfc4517d46543b
This works by sending a compressed form of the schema with each message over
have Kryo serialize the in-memory representation itself. Since the same schema
is going to be sent numerous times, it caches previously seen values as to
reduce the computation needed. It also allows users to register their schemas
ahead of time. This allows it to just send the schema’s unique ID with each
message, over the entire schema itself.
Could I get some feedback about this approach or let me know if I am missing
anything important.
> Automatically Use Avro Serialization for Avro Objects
> -----------------------------------------------------
>
> Key: SPARK-746
> URL: https://issues.apache.org/jira/browse/SPARK-746
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Reporter: Patrick Cogan
>
> All generated objects extend org.apache.avro.specific.SpecificRecordBase (or
> there may be a higher up class as well).
> Since Avro records aren't JavaSerializable by default people currently have
> to wrap their records. It would be good if we could use an implicit
> conversion to do this for them.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]