Re: How to define the format of files outputed by command INSERT OVERWRITE LOCAL DIRECTORY?

Zheng Shao Fri, 05 Jun 2009 02:30:22 -0700

Let me summarize the steps:
1. Create external table on HDFS (STORED AS TEXTFILE) and INSERT OVERWRITE
that table.
2. Use "hadoop dfs -getmerge" to get the data.


So there is no need for running a "INSERT OVERWRITE LOCAL DIRECTORY".

Zheng

On Fri, Jun 5, 2009 at 2:16 AM, Min Zhou <[email protected]> wrote:

> hive> explain INSERT OVERWRITE LOCAL DIRECTORY '/home/hive/result' SELECT *
> FROM staticacs;
> OK
> ABSTRACT SYNTAX TREE:
>   (TOK_QUERY (TOK_FROM (TOK_TABREF staticacs)) (TOK_INSERT (TOK_DESTINATION
> (TOK_LOCAL_DIR '/home/hive/result')) (TOK_SELECT (TOK_SELEXPR
> TOK_ALLCOLREF))))
>
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
>
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         staticacs
>             Select Operator
>               expressions:
>                     expr: time
>                     type: string
>                     expr: application
>                     type: string
>                     expr: count1
>                     type: int
>                     expr: count2
>                     type: int
>                     expr: dt
>                     type: string
>               File Output Operator
>                 compressed: true
>                 GlobalTableId: 1
>                 table:
>                     input format: org.apache.hadoop.mapred.TextInputFormat
>                     output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>
>   Stage: Stage-0
>     Move Operator
>       files:
>             hdfs directory: false
>             destination: /home/hive/result
>
> Thanks,
> Min
>
>
> On Fri, Jun 5, 2009 at 4:53 PM, Zheng Shao <[email protected]> wrote:
>
>> Can you give an example? Please also include the results of "explain
>> <query>" so we can see how many map-reduce jobs are there.
>>
>> Zheng
>>
>>
>>
>> On Fri, Jun 5, 2009 at 1:21 AM, Min Zhou <[email protected]> wrote:
>>
>>> but is there any way to define row format of files outputed by command
>>> "INSERT OVERWRITE LOCAL DIRECTORY '/tmp/aaa' SELECT ..." ? If I create an
>>> external table to store result , then put it to local by that command, more
>>> mapred job needed.
>>>
>>> Min
>>>
>>>
>>> On Fri, Jun 5, 2009 at 2:04 PM, Zheng Shao <[email protected]> wrote:
>>>
>>>> Use dfs -getmerge.
>>>>
>>>> By the way, if you just want text file format, "INSERT OVERWRITE LOCAL
>>>> DIRECTORY '/tmp/aaa' SELECT ..." is good enough. You don't need to split 
>>>> the
>>>> process into 2 steps.
>>>>
>>>> Zheng
>>>>
>>>>
>>>> On Thu, Jun 4, 2009 at 8:22 PM, Min Zhou <[email protected]> wrote:
>>>>
>>>>> So how do you copy them back?
>>>>> use dfs -germerge or INSERT OVERWRITE LOCAL DIRECTORY?
>>>>>
>>>>> Thanks,
>>>>> Min
>>>>>
>>>>>
>>>>> On Thu, Jun 4, 2009 at 1:48 PM, Zheng Shao <[email protected]> wrote:
>>>>>
>>>>>> No, we would need to copy the file back from HDFS. But I think there
>>>>>> is no overhead - we are doing the same thing as INSERT OVERWRITE LOCAL
>>>>>> DIRECTORY, I think.
>>>>>>
>>>>>> Zheng
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 3, 2009 at 10:45 PM, Min Zhou <[email protected]>wrote:
>>>>>>
>>>>>>> Can external table stored at a local directory?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 4, 2009 at 1:42 PM, Zheng Shao <[email protected]> wrote:
>>>>>>>
>>>>>>>> You can first create an external table and then insert into that
>>>>>>>> table.
>>>>>>>>
>>>>>>>> Zheng
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 3, 2009 at 10:31 PM, Min Zhou <[email protected]>wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Any helps?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>> Min
>>>>>>>>> --
>>>>>>>>> My research interests are distributed systems, parallel computing
>>>>>>>>> and bytecode based virtual machine.
>>>>>>>>>
>>>>>>>>> My profile:
>>>>>>>>> http://www.linkedin.com/in/coderplay
>>>>>>>>> My blog:
>>>>>>>>> http://coderplay.javaeye.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Yours,
>>>>>>>> Zheng
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> My research interests are distributed systems, parallel computing and
>>>>>>> bytecode based virtual machine.
>>>>>>>
>>>>>>> My profile:
>>>>>>> http://www.linkedin.com/in/coderplay
>>>>>>> My blog:
>>>>>>> http://coderplay.javaeye.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Yours,
>>>>>> Zheng
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> My research interests are distributed systems, parallel computing and
>>>>> bytecode based virtual machine.
>>>>>
>>>>> My profile:
>>>>> http://www.linkedin.com/in/coderplay
>>>>> My blog:
>>>>> http://coderplay.javaeye.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Yours,
>>>> Zheng
>>>>
>>>
>>>
>>>
>>> --
>>> My research interests are distributed systems, parallel computing and
>>> bytecode based virtual machine.
>>>
>>> My profile:
>>> http://www.linkedin.com/in/coderplay
>>> My blog:
>>> http://coderplay.javaeye.com
>>>
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>
>
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com
>



-- 
Yours,
Zheng

Re: How to define the format of files outputed by command INSERT OVERWRITE LOCAL DIRECTORY?

Reply via email to