Re: SerDe and Rows

Sanjit Jhala Thu, 03 Jun 2010 12:37:28 -0700

I'm wondering why the Split class needs to extend FileSplit and also why the
InputFormat needs to call FileInputFormat.getInputPaths(job) in getSplits.
Is this because of legacy code that needs to be cleaned up or does it get
used somewhere?


-Sanjit

On Wed, Jun 2, 2010 at 12:59 PM, Edward Capriolo <[email protected]>wrote:

>
>
> On Wed, Jun 2, 2010 at 3:17 PM, Sanjit Jhala <[email protected]> wrote:
>
>> Thanks, that sounds great ! Would love to come to the meetup. Thanks for
>> all the work on the Storage Handlers, its really nifty stuff.
>> I'm getting close on the Hypertable storage handler and will definitely
>> send out pointers once its ready.
>>
>> -Sanjit
>>
>>
>> On Wed, Jun 2, 2010 at 11:56 AM, John Sichi <[email protected]> wrote:
>>
>>> Based on some recent offline discussions, it looks like CloudEra will be
>>> taking the lead on driving the release process for 0.6, so expect to see
>>> some initial plans on that here soon.
>>>
>>> We're thinking of classifying new features and frameworks as stable vs
>>> experimental.  For 0.6, items like storage handlers will definitely be
>>> classified as experimental, meaning they'll be there in the code, but
>>> expected to continue to evolve with breaking changes until they are declared
>>> stable in a subsequent release.
>>>
>>> We would also like to start holding monthly Hive developer meetups; it
>>> will be great if someone from hypertable can attend those--it's heartening
>>> to see so much interest in building up a storage handler ecosystem.
>>>
>>> I think the snapshot you reference is fine for trunk development work.
>>>
>>> Regarding thrift, here's info on the version currently being used:
>>>
>>> http://wiki.apache.org/hadoop/Hive/HowToContribute#Generating_Code
>>>
>>> JVS
>>>
>>> On Jun 2, 2010, at 11:19 AM, Sanjit Jhala wrote:
>>>
>>> Any idea when the next Hive release is scheduled for and whether the
>>> Storage Handler code will be included ?
>>>
>>> Also I'm currently using a snapshot from the trunk at commit:
>>>
>>> *commit bf7e3b9cc6c6ceced2dec70f0971ecc91fd0dcb3*
>>> *Author: Namit Jain <[email protected]>
>>> Date:   Thu May 6 19:05:52 2010 +0000
>>>
>>>     HIVE-1317. CombineHiveInputFormat throws exception when partition
>>> name contains special characte
>>>     (Ning Zhang via namit)
>>>
>>>     git-svn-id:
>>> https://svn.apache.org/repos/asf/hadoop/hive/tr...@94186013f79535-47bb-0310-9956-ff
>>> *
>>>
>>>
>>> Is this a reasonably stable commit or would you suggest another ? Also
>>> how do I figure out the corresponding Thrift version ?
>>>
>>> -Sanjit
>>>
>>>
>>>
>>>
>>> On Tue, Jun 1, 2010 at 5:36 PM, John Sichi <[email protected]> wrote:
>>>
>>>> On Jun 1, 2010, at 4:45 PM, Sanjit Jhala wrote:
>>>>
>>>> > That looks cool. On a different note, it looks like the
>>>> HiveStorageHandler is based on the old Hadoop "mapred" interface. Any idea
>>>> when you plan to migrate to the "mapreduce" interface?
>>>>
>>>>
>>>> This one would be painful to do with shims, so I think it has to wait
>>>> until we drop support entirely for pre-0.20 Hadoop versions on Hive trunk.
>>>>  For Facebook, we may be ready for that within a few months; I'm not sure
>>>> about other Hive users.
>>>>
>>>> JVS
>>>>
>>>>
>>>
>>>
>> IMHO...
>
> Trunk has lot of features that 5.0 does not have. All (most) of the
> development for hive happens on the trunk. Trunk changes 2-3 times a week so
> it is a moving target.
>
>
>
> Hive is all userspace code, anyone who understands that can have 100
> different versions of hive in their home directory configured to the same
> metastore and hdfs.
>
> I currently have hive 5.0 latest release installed on the system path.
> /usr/bin/hive6  -> another hive install of (trunk)
>
> This gives me the best of both worlds. Users can pick and chose the hive
> they want to run with. I am really not caught up in releases. They are good
> things but in general I can not wait for them.
>
> Edward
>
>
>

Re: SerDe and Rows

Reply via email to