[
https://issues.apache.org/jira/browse/HUDI-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Kudinkin updated HUDI-4959:
----------------------------------
Description:
Originally reported in:
https://github.com/apache/hudi/issues/6621
Kryo (used in SerializationUtils) by default allows class objects to be
serialized w/o prior registration w/ Kryo: in that case Kryo will encode the
first occurrence of the object of a particular class with full class-name, but
subsequent occurrences will be using class-id associated with it (on the fly).
This poses issues for durable serialization (when we persist such serialized
layout) in this case we're trying to deserialize file that doesn't have the
class-name encoded and since user is running a different Spark job to read
there's no association preserved in-memory either.
*NOTE: We should be using custom serialization sequences for every object we
serialize for durable persistence, and avoid using frameworks like Kryo for
that.*
was:
Kryo (used in SerializationUtils) by default allows class objects to be
serialized w/o prior registration w/ Kryo: in that case Kryo will encode the
first occurrence of the object of a particular class with full class-name, but
subsequent occurrences will be using class-id associated with it (on the fly).
This poses issues for durable serialization (when we persist such serialized
layout) in this case we're trying to deserialize file that doesn't have the
class-name encoded and since user is running a different Spark job to read
there's no association preserved in-memory either.
*NOTE: We should be using custom serialization sequences for every object we
serialize for durable persistence, and avoid using frameworks like Kryo for
that.*
> Serializing HoodieKey objects using Kryo fails to deserialize data back w/o
> prior registration
> ----------------------------------------------------------------------------------------------
>
> Key: HUDI-4959
> URL: https://issues.apache.org/jira/browse/HUDI-4959
> Project: Apache Hudi
> Issue Type: Bug
> Components: writer-core
> Affects Versions: 0.12.0
> Reporter: Alexey Kudinkin
> Assignee: Alexey Kudinkin
> Priority: Blocker
> Fix For: 0.13.0
>
>
> Originally reported in:
> https://github.com/apache/hudi/issues/6621
>
> Kryo (used in SerializationUtils) by default allows class objects to be
> serialized w/o prior registration w/ Kryo: in that case Kryo will encode the
> first occurrence of the object of a particular class with full class-name,
> but subsequent occurrences will be using class-id associated with it (on the
> fly).
> This poses issues for durable serialization (when we persist such serialized
> layout) in this case we're trying to deserialize file that doesn't have the
> class-name encoded and since user is running a different Spark job to read
> there's no association preserved in-memory either.
> *NOTE: We should be using custom serialization sequences for every object we
> serialize for durable persistence, and avoid using frameworks like Kryo for
> that.*
--
This message was sent by Atlassian Jira
(v8.20.10#820010)