I think it will be better to take a look at LazySimpleSerDe to see how it serializes and deserializes Struct types. Your implementation should be such that it works with this SerDe seamlessly.
More specifically, creating a simple POJO may not work due to inherent marshaling/encoding semantics that must be observed to conform to the ByteWritable contracts. Arvind On Fri, Apr 16, 2010 at 11:04 AM, Sagar Naik <[email protected]> wrote: > Hi Arvind, > Thanks for explanation. > > I am newbie so I am not familiar with terms. > Struct implementation is POJO or some thing else. > > My guess is struct is a simple POJO . If so then simple POJO represented in > BYTES will be passed to BytesWritable . > And it should work ? > > > > -Sagar > > On Apr 16, 2010, at 9:58 AM, Arvind Prabhakar wrote: > > Sagar, > > Unfortunately it is more complicated than that. The idea behind the record > reader implementation is to actually convert the underlying writable into a > type that is understood by the SerDe layer. At this time, the SerDe layer > seems to understand ByteWritable and Text types. So - if you could take your > custom type and emit a ByteWritable that represents a struct implementation > of the same, it would work. > > Another alternative which would be simple to implement would be to do the > following: > > 1. Modify your custom writable so that it has a toString() method that > generates a parsable representation of the fields. For example you could use > the JSON representation in your toString() method. > > 2. Create the external table with inputformat > 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' and outputformat > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat', mapping the > entire value type to a single string column. > > 3. Use the UDFJson to extract the individual attributes from the JSON > string that is emitted from the select query. > > You can use this output to populate a new table that now has the parsed > values separated out in the warehouse. > > Arvind > > > On Thu, Apr 15, 2010 at 6:01 PM, Sagar Naik <[email protected]> wrote: > >> Hi Arvind, >> >> U guessed it correct. >> >> We have custom writables. >> I saw the TextRecordReader implementation to get an idea on RecordReader. >> >> It looks like createRow creates an instance and next(...) populates this >> instance. >> The createRow returns an instance of Writable. >> >> Is the Writable Instance same as "struct" from u r reply >> >> How is this Writable instance mapped to column names ? >> Is there something in commandline syntax which binds the Writable instance >> to column names and values ? >> Or ObjectInspector will do it magically >> >> -Sagar >> On Apr 15, 2010, at 12:00 PM, Arvind Prabhakar wrote: >> >> Hi Sagar, >> >> Looks like your source file has custom writable types in it. If that is >> the case, implementing a SerDe that works with that type may not be that >> straight forward, although doable. >> >> An alternative would be to implement a custom RecordReader that converts >> the value of your custom writable to Struct type which can then be queried >> directly. >> >> Arvind >> >> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <[email protected]> wrote: >> >>> Hi >>> >>> My data is in the value field of a sequence file. >>> The value field has subfields in it. I am trying to create table using >>> these subfields. >>> Example: >>> <KEY> <VALUE> >>> <KEY_FIELD1, KEYFIELD 2> forms the key >>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>. >>> So i am trying to create a table from VALUE_FIELD* >>> >>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as >>> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE; >>> >>> I am planing to a write a custom SerDe implementation and custom >>> SequenceFileReader >>> Pl let me knw if I am on the right track. >>> >>> >>> -Sagar >> >> >> >> > >
