Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by ZhengShao: http://wiki.apache.org/hadoop/Hive/DeveloperGuide ------------------------------------------------------------------------------ * DynamicSerDe: This serde also read/write thrift serialized objects, but it understands thrift DDL so the schema of the object can be provided at runtime. Also it supports a lot of different protocols, including TBinaryProtocol, TJSONProtocol, TCTLSeparatedProtocol (which writes data in delimited records). How to write your own SerDe: - * In most cases, users want to write a Deserializer instead of a SerDe. + * In most cases, users want to write a Deserializer instead of a SerDe, because users just want to read their own data format instead of writing to it. * For example, the RegexDeserializer will deserialize the data using the configuration parameter 'regex', and possibly a list of column names (see serde2.MetadataTypedColumnsetSerDe). Please see serde2/Deserializer.java for details. + * If your SerDe supports DDL (basically, SerDe with parameterized columns and column types), you probably want to implement a Protocol based on DynamicSerDe, instead of writing a SerDe from scratch. The reason is that the framework passes DDL to SerDe through "thrift DDL" format, and it's non-trivial to write a "thrift DDL" parser. + + Some important points of SerDe: + * SerDe, not the DDL, defines the table schema. Some SerDe implementations use the DDL for configuration, but SerDe can also override that. + * Column types can be arbitrarily nested arrays, maps and structures. + * The callback design of ObjectInspector allows lazy deserialization with CASE/IF or when using complex or nested types. + === MetaStore ===
