Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The following page has been changed by ZhengShao:
http://wiki.apache.org/hadoop/Hive/DeveloperGuide

------------------------------------------------------------------------------
    * DynamicSerDe: This serde also read/write thrift serialized objects, but 
it understands thrift DDL so the schema of the object can be provided at 
runtime.  Also it supports a lot of different protocols, including 
TBinaryProtocol, TJSONProtocol, TCTLSeparatedProtocol (which writes data in 
delimited records).
  
  How to write your own SerDe:
-   * In most cases, users want to write a Deserializer instead of a SerDe.
+   * In most cases, users want to write a Deserializer instead of a SerDe, 
because users just want to read their own data format instead of writing to it.
    * For example, the RegexDeserializer will deserialize the data using the 
configuration parameter 'regex', and possibly a list of column names (see 
serde2.MetadataTypedColumnsetSerDe). Please see serde2/Deserializer.java for 
details.
+   * If your SerDe supports DDL (basically, SerDe with parameterized columns 
and column types), you probably want to implement a Protocol based on 
DynamicSerDe, instead of writing a SerDe from scratch. The reason is that the 
framework passes DDL to SerDe through "thrift DDL" format, and it's non-trivial 
to write a "thrift DDL" parser.
+ 
+ Some important points of SerDe:
+   * SerDe, not the DDL, defines the table schema. Some SerDe implementations 
use the DDL for configuration, but SerDe can also override that.
+   * Column types can be arbitrarily nested arrays, maps and structures.
+   * The callback design of ObjectInspector allows lazy deserialization with 
CASE/IF or when using complex or nested types.
+ 
  
  === MetaStore ===
  

Reply via email to