Re: table from sequence file

Arvind Prabhakar Fri, 16 Apr 2010 12:05:57 -0700

I think it will be better to take a look at LazySimpleSerDe to see how it
serializes and deserializes Struct types. Your implementation should be such
that it works with this SerDe seamlessly.


More specifically, creating a simple POJO may not work due to
inherent marshaling/encoding semantics that must be observed to conform to
the ByteWritable contracts.

Arvind

On Fri, Apr 16, 2010 at 11:04 AM, Sagar Naik <[email protected]> wrote:

> Hi Arvind,
> Thanks for explanation.
>
> I am newbie so I am not familiar with terms.
> Struct implementation is POJO or some thing else.
>
> My guess is struct is a simple POJO . If so then simple POJO represented in
> BYTES will be passed to BytesWritable .
> And it should work ?
>
>
>
> -Sagar
>
> On Apr 16, 2010, at 9:58 AM, Arvind Prabhakar wrote:
>
> Sagar,
>
> Unfortunately it is more complicated than that. The idea behind the record
> reader implementation is to actually convert the underlying writable into a
> type that is understood by the SerDe layer. At this time, the SerDe layer
> seems to understand ByteWritable and Text types. So - if you could take your
> custom type and emit a ByteWritable that represents a struct implementation
> of the same, it would work.
>
> Another alternative which would be simple to implement would be to do the
> following:
>
> 1. Modify your custom writable so that it has a toString() method that
> generates a parsable representation of the fields. For example you could use
> the JSON representation in your toString() method.
>
> 2. Create the external table with inputformat
> 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' and  outputformat
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat', mapping the
> entire value type to a single string column.
>
> 3. Use the UDFJson to extract the individual attributes from the JSON
> string that is emitted from the select query.
>
> You can use this output to populate a new table that now has the parsed
> values separated out in the warehouse.
>
> Arvind
>
>
> On Thu, Apr 15, 2010 at 6:01 PM, Sagar Naik <[email protected]> wrote:
>
>> Hi Arvind,
>>
>> U guessed it correct.
>>
>> We have custom writables.
>> I saw the TextRecordReader implementation to get an idea on RecordReader.
>>
>> It looks like createRow creates an instance and next(...) populates this
>> instance.
>> The createRow returns an instance of Writable.
>>
>> Is the Writable Instance same as "struct" from u r reply
>>
>> How is this Writable instance mapped to column names ?
>> Is there something in commandline syntax which binds the Writable instance
>> to column names and values ?
>> Or ObjectInspector will do it magically
>>
>> -Sagar
>> On Apr 15, 2010, at 12:00 PM, Arvind Prabhakar wrote:
>>
>> Hi Sagar,
>>
>> Looks like your source file has custom writable types in it. If that is
>> the case, implementing a SerDe that works with that type may not be that
>> straight forward, although doable.
>>
>> An alternative would be to implement a custom RecordReader that converts
>> the value of your custom writable to Struct type which can then be queried
>> directly.
>>
>> Arvind
>>
>> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <[email protected]> wrote:
>>
>>> Hi
>>>
>>> My data is in the value field of a sequence file.
>>> The value field has subfields in it. I am trying to create table using
>>> these subfields.
>>> Example:
>>> <KEY> <VALUE>
>>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>>> So i am trying to create a table from VALUE_FIELD*
>>>
>>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as
>>> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>>
>>> I am planing to a write a custom SerDe implementation and custom
>>> SequenceFileReader
>>> Pl let me knw if I am on the right track.
>>>
>>>
>>> -Sagar
>>
>>
>>
>>
>
>

Re: table from sequence file

Reply via email to